2019
DOI: 10.1007/978-3-030-30493-5_59
|View full text |Cite
|
Sign up to set email alerts
|

Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
36
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 51 publications
(36 citation statements)
references
References 15 publications
0
36
0
Order By: Relevance
“…It showed encouraging performance on action recognition of the NTU rgb+d dataset [30]. The proposed algorithm in [29] was modified to accept a custom graph layout, which is appropriate for sign language graph representation [31]. This modified version of the algorithm was evaluated on a dataset containing 20 selected classes from the ASLLVD dataset.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…It showed encouraging performance on action recognition of the NTU rgb+d dataset [30]. The proposed algorithm in [29] was modified to accept a custom graph layout, which is appropriate for sign language graph representation [31]. This modified version of the algorithm was evaluated on a dataset containing 20 selected classes from the ASLLVD dataset.…”
Section: Related Workmentioning
confidence: 99%
“…The Skeleton Aware multi-stream sign language recognition framework is one of the most recent graph-based systems for sign language recognition [32,33]. These frameworks combined the ST-GCN [31] with other input channels such as RGB frames and optical flow; in a multimodality scheme, the different modalities are integrated and fused at different levels. Even though these systems achieved excellent performance on the AUTSL dataset, the main drawback of this framework is that it is slow and involves a high computation cost.…”
Section: Related Workmentioning
confidence: 99%
“…Attention LSTM, attention GRU and Transformer networks were also tested but they led to inferior performance. De Amorim et al in [ 82 ], proposed an American SLR method that extracts skeletal data from video sequences and then processes them using a Spatio-Temporal Graph Convolutional Network (GCN) [ 83 ]. Tunga et al in [ 84 ], proposed a SLR method that extracts skeletal features from video sequences and then employs a GCN network to model spatial dependencies among the skeletal data, as well as a BERT model to model temporal dependencies among the skeletal data.…”
Section: Sign Language Recognitionmentioning
confidence: 99%
“…In this way, the authors achieved a really high accuracy of 97.36% in the CSL-500 dataset. GCNs are computationally lighter than the image processing networks, but they often cannot extract highly enriched features, thus leading to inferior performance, as noted in [ 82 ].…”
Section: Sign Language Recognitionmentioning
confidence: 99%
“…However, skeleton-based SLR methods are still under exploration. Simply applying the ST-GCN to SLR has been unsuccessful, which only reaches around 60% top-1 accuracy on 20 classes (much worse than RGB-based approaches) [48]. Multi-modal Approach aims to explore data captured from different resources, by different devices, and from distinctive views to improve the overall performance.…”
Section: Related Workmentioning
confidence: 99%