Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475463
|View full text |Cite
|
Sign up to set email alerts
|

Towards Fast and High-Quality Sign Language Production

Abstract: Sign Language Production (SLP) aims to automatically translate a spoken language description to its corresponding sign language video. The core procedure of SLP is to transform sign gloss intermediaries into sign pose sequences (G2P). Most existing methods for G2P are based on sequential autoregression or sequence-tosequence encoder-decoder learning. However, by generating target pose frames conditioned on the previously generated ones, these models are prone to bringing issues such as error accumulation and h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(19 citation statements)
references
References 41 publications
0
19
0
Order By: Relevance
“…It is worth noting that, as stated by (Huang et al, 2021), the proposed decoding mechanism provides weak supervisions with the initial ground-truth frame and guided counter sequences during the inference time.…”
Section: Progressive Transformer Baselinementioning
confidence: 99%
“…It is worth noting that, as stated by (Huang et al, 2021), the proposed decoding mechanism provides weak supervisions with the initial ground-truth frame and guided counter sequences during the inference time.…”
Section: Progressive Transformer Baselinementioning
confidence: 99%
“…Further, Saunders et al [6] proposed a spatial-temporal skeletal graph attention layer that embeds a hierarchical body inductive bias into the self-attention mechanism. Huang et al [4] developed spatial-temporal graph convolution layers into the pose generator which is able to capture both intraframe and inter-frame information of sign language videos. However, all these methods disregard each joint has different contributions to gestures expression.…”
Section: B Sign Language Productionmentioning
confidence: 99%
“…Recently, Transformer-based methods [1], [2], [3], [4], [5] became the most widespread methods to produce skeletons for SLP. However, there is still a problem in these works: such architecture always ignores the structural relationships of the human skeletons, by which poor performance would be obtained.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, there have been many deep learning approaches to SLP proposed [23,42,48,50,52,56,63,71], with Saunders et al achieving state-of-the-art results with gloss supervision [52]. These works predominantly represent sign languages as sequences of skeletal frames, with each frame encoded as a vector of joint coordinates [51] that disregards any spatio-temporal structure available within a skeletal representation.…”
Section: Related Workmentioning
confidence: 99%