Convolutional neural network with spatial pyramid pooling for hand gesture recognition

Tan, Yong Soon; Lim, Kian Ming; Connie, Tee; Lee, Chin Poo; Low, Cheng Yaw

doi:10.1007/s00521-020-05337-0

Cited by 74 publications

(31 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ASPP exploits and preserves the fine details around occlusions. Tan et al [20] yielded a fixed-length feature representation using spatial pyramid pooling, which recognizes hand gestures regardless of input size. This method facilitates the propagation of gradients from the final fully connected layer to the input layer.…”

Section: Receptive Fieldmentioning

confidence: 99%

RFaNet: Receptive Field-Aware Network with Finger Attention for Fingerspelling Recognition Using a Depth Sensor

et al. 2021

View full text Add to dashboard Cite

Automatic fingerspelling recognition tackles the communication barrier between deaf and hearing individuals. However, the accuracy of fingerspelling recognition is reduced by high intra-class variability and low inter-class variability. In the existing methods, regular convolutional kernels, which have limited receptive fields (RFs) and often cannot detect subtle discriminative details, are applied to learn features. In this study, we propose a receptive field-aware network with finger attention (RFaNet) that highlights the finger regions and builds inter-finger relations. To highlight the discriminative details of these fingers, RFaNet reweights the low-level features of the hand depth image with those of the non-forearm image and improves finger localization, even when the wrist is occluded. RFaNet captures neighboring and inter-region dependencies between fingers in high-level features. An atrous convolution procedure enlarges the RFs at multiple scales and a non-local operation computes the interactions between multi-scale feature maps, thereby facilitating the building of inter-finger relations. Thus, the representation of a sign is invariant to viewpoint changes, which are primarily responsible for intra-class variability. On an American Sign Language fingerspelling dataset, RFaNet achieved 1.77% higher classification accuracy than state-of-the-art methods. RFaNet achieved effective transfer learning when the number of labeled depth images was insufficient. The fingerspelling representation of a depth image can be effectively transferred from large- to small-scale datasets via highlighting the finger regions and building inter-finger relations, thereby reducing the requirement for expensive fingerspelling annotations.

show abstract

Section: Receptive Fieldmentioning

confidence: 99%

RFaNet: Receptive Field-Aware Network with Finger Attention for Fingerspelling Recognition Using a Depth Sensor

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Yong Soon Tan [13] developed CNN with Spatial Pyramid Pooling (SPP) for hand gesture recognition. The model CNN was combined with SPP for hand gesture recognition and it was developed to overcome the conventional pooling problem using multilevel pooling extended the features that were fed for the connected layer.…”

Section: Literature Reviewmentioning

confidence: 99%

“…The features utilized showed high dimension Table 1. The comparative analysis for the proposed AE-BiLSTM with the existing algorithms Authors Methodology Dataset Accuracy (%) Gangrade [11] Oriented FAST and Rotated BRIEF ISL 93.26 Jayesh and Bharti [12] CNN 99 Tan [13] CNN-SPP NUS 98.40 Chandra and Lall [14] CNN 99.67 Madni [15] Improved reliefF K-nearest neighbour ISL 98.95 Proposed Method AE-Bi-LSTM ISL 99.85 NUS 99.75 data complexity and thus the accuracy was lowered up-to 93.26 %. Similarly, [12,14] CNN model was utilized for better automated classification but it failed to analyse for different dataset and obtained accuracy of 99%.…”

Section: Comparative Analysismentioning

confidence: 99%

“…Similarly, [12,14] CNN model was utilized for better automated classification but it failed to analyse for different dataset and obtained accuracy of 99%. The CNN-SPP model [13] was utilized for hand gesture recognition but however it failed to improve the classification accuracy due to the high dimensionality of the data hence it obtained accuracy of 98.40%. Whereas, the proposed AE-Bi-LSTM outperformed with better results when compared to the existing algorithms as the AE is used for dimension reduction, thus, it showed improvement of accuracy for both the datasets; 99.85 % in ISL and 99.75 % in NUS.…”

Section: Comparative Analysismentioning

confidence: 99%

See 1 more Smart Citation

Hand Gesture Recognition using Auto Encoder with Bi-direction Long Short Term Memory

2021

IJIES

View full text Add to dashboard Cite

“…Kumarage et al (2011) proposed to subdivide the transactions for recognition via parallel processing and mapping the motion data to static data representations, Also, the issue of matching sign language gestures with linear / nonlinear equations was mentioned. In the paper which emerged as a result of the research of Tan et al (2020), a CNN with spatial pyramid pooling for vision-based hand gesture recognition was introduced. The performance of the proposed method was evaluated on American sign language datasets and hand gesture dataset.…”

Section: Literature Surveymentioning

confidence: 99%

Recognition of Sign Language Letters Using Image Processing and Deep Learning Methods

Öztürk¹,

Karatekin²,

Saylar³

et al. 2021

Journal of Intelligent Systems: Theory and Applications

View full text Add to dashboard Cite

In order for people to be able to communicate with each other, they must be able to agree mutually. Communication is quite di fficult for individuals with hearing problems. Such individuals make their lives much more difficult by isolating themselves from society. The people living with hearing loss can understand the contact person with often lip-reading method, but it is quite difficult for them to express themselves to the people. Since the use of sign language has not become widespread around the world, the number of people who know sign language is very low, except for individuals with hearing disabilities. In this study, it was achieved t o dynamically recognize the movements of the sign language finger alphabet via image processing using deep learning methods and to translate it into writing. Accordingly, it is aimed to facilitate communication between people who do not know the sign language in daily life and people with hearing loss. The input given to the system is an image of the hand showing any letter from the alphabet. The image of the hand is interpreted by deep learning methods in the system, and it is compared to one of the letters in the alphabet and an output with the similarity ratio to this letter is displayed on the screen. The system has been tested with a total of 1300 images. The overall accuracy rate of the system was calculated as 88% where true positive rate was 87% and false negative rate was 13%.

show abstract

Convolutional neural network with spatial pyramid pooling for hand gesture recognition

Cited by 74 publications

References 33 publications

RFaNet: Receptive Field-Aware Network with Finger Attention for Fingerspelling Recognition Using a Depth Sensor

RFaNet: Receptive Field-Aware Network with Finger Attention for Fingerspelling Recognition Using a Depth Sensor

Hand Gesture Recognition using Auto Encoder with Bi-direction Long Short Term Memory

Recognition of Sign Language Letters Using Image Processing and Deep Learning Methods

Contact Info

Product

Resources

About