2021 IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
DOI: 10.1109/wacv48630.2021.00347
|View full text |Cite
|
Sign up to set email alerts
|

Hand Pose Guided 3D Pooling for Word-level Sign Language Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(7 citation statements)
references
References 35 publications
0
7
0
Order By: Relevance
“…Firstly, several 2D-CNN-LSTM models have been trained separately using RGB, depth, and optical flow data and then these features are fused at the classification level using the best 2D-CNN-LSTM model. In [11], both motion and hand shape cues have been used as input features and fed to a 3D-CNN. A pose-guided 3D pooling mechanism is used to fuse the prediction score during test time.…”
Section: Hybrid/multi-modal Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Firstly, several 2D-CNN-LSTM models have been trained separately using RGB, depth, and optical flow data and then these features are fused at the classification level using the best 2D-CNN-LSTM model. In [11], both motion and hand shape cues have been used as input features and fed to a 3D-CNN. A pose-guided 3D pooling mechanism is used to fuse the prediction score during test time.…”
Section: Hybrid/multi-modal Methodsmentioning
confidence: 99%
“…The latest deep learning models make use of ConvNets to extract spatial cues and Recurrent neural networks to model temporal dependencies. Some of these models make use of 3D ConvNets [11] to fuse spatial and temporal cues. However, appearance-based methods have distinctively higher computational complexity originating from higher data dimensionality.…”
Section: Introductionmentioning
confidence: 99%
“…The overall dataset's performance of the Top-1, Top-2, and Top-3 percentages was analyzed in terms of accuracy for each dataset group. The performance of the hDNN-SLR was evaluated with the other cutting-edge models such as I3D [43], Pose-TGCN [43], Pose-GRU [43], GCN-BERT [26], Multi-Stream [44], and Fusion-3 [45]. Table 2 shows the performance evaluation of hDNN-SLR with other baseline architectures in terms of Accuracy measures using the WLASL100, WLASL300, WLASL1000 and WLASL2000 dataset consisting of 100, 300, 1000 and 2000 word sign videos.…”
Section: E Experimental For Islr Using a Benchmark Datasetmentioning
confidence: 99%
“…They achieve 65.89%, 84.11%, and 89.92% for top-1, top-5, and top-10 accuracies, respectively, on WLASL with 100 sign classes, and 56.14%, 79.94%, 86.98% on WLASL with 300 classes. In other work that uses motion and hand shapes to guide the pooling for a three-dimensional CNN on RGB video frames, Hosain et al [87] achieve top-1, top-5, and top-10 accuracies on WLASL 100 of 75.67%, 86.83%, and 90.91%, respectively, and 68.30%, 84.19%, 87.06% on WLASL 300, using their Fusion-2 and Fusion-3 models.…”
Section: Related Workmentioning
confidence: 99%