2020
DOI: 10.1609/aaai.v34i07.7001
|View full text |Cite
|
Sign up to set email alerts
|

Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Abstract: Despite the recent success of deep learning in continuous sign language recognition (CSLR), deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind the collaboration of different visual cues (i,e., hand shape, facial expression and body posture). By injecting multi-cue learning into neural network design, we propose a spatial-temporal multi-c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
107
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 150 publications
(107 citation statements)
references
References 55 publications
(104 reference statements)
0
107
0
Order By: Relevance
“…Then, a translation system translates the recognized glosses into spoken language. Recent work (Orbay and Akarun, 2020;Zhou et al, 2020) has addressed the first step, but there has been none improving the translation system. This paper aims to fill this research gap by leveraging recent success in Neural Machine Translation (NMT), namely Transformers.…”
Section: Introductionmentioning
confidence: 99%
“…Then, a translation system translates the recognized glosses into spoken language. Recent work (Orbay and Akarun, 2020;Zhou et al, 2020) has addressed the first step, but there has been none improving the translation system. This paper aims to fill this research gap by leveraging recent success in Neural Machine Translation (NMT), namely Transformers.…”
Section: Introductionmentioning
confidence: 99%
“…The whole architecture was optimized iteratively in two stages using CTC loss and cross-entropy loss with pseudo-labels. In [ 45 ], the authors proposed a method for exploiting not only RGB data, but also information from multiple cues, such as the pose, hands, and face of the signer, aiming to find correlations between the different cues.…”
Section: Related Workmentioning
confidence: 99%
“…In Table 3 , the proposed method is compared against several state-of-the-art methods using only RGB data (for fair comparison, methods based on multi-cue information, e.g., [ 45 ], are not included in Table 3 ) on the RWTH-Phoenix-Weather-2014 dataset. SLRGAN achieved a WER of on the validation set and on the test set.…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…In recent years, feature fusion gradually receives attention. To make full use of different types of features, Su [37] combined the CNN and LSTM to form the fusion network, H. Zhou and W. Zhou also designed the spatial-temporal Multi-Cue network [38] for fully exploring the features from different cues.…”
Section: Network Learningmentioning
confidence: 99%