2020
DOI: 10.1007/978-3-030-58621-8_40
|View full text |Cite
|
Sign up to set email alerts
|

Progressive Transformers for End-to-End Sign Language Production

Abstract: The goal of automatic Sign Language Production (SLP) is to translate spoken language to a continuous stream of sign language video at a level comparable to a human translator. If this was achievable, then it would revolutionise Deaf hearing communications. Previous work on predominantly isolated SLP has shown the need for architectures that are better suited to the continuous domain of full sign sequences. In this paper, we propose Progressive Transformers, the first SLP model to translate from discrete spoken… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
136
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 73 publications
(137 citation statements)
references
References 44 publications
0
136
1
Order By: Relevance
“…This approach resulted in smooth transitions between glosses and refined details on hand and finger shapes. Saunders et al in [ 114 ], employed Transformers to automatically generate 3D human poses from spoken text using a multiple-level configuration. A text-to-gloss-to-pose (T2G2P) network with Transformer layers translated text sentences to sign language glosses and finally to 3D poses, while a text-to-pose (T2P) network directly transformed text into human poses.…”
Section: Sign Language Representationmentioning
confidence: 99%
“…This approach resulted in smooth transitions between glosses and refined details on hand and finger shapes. Saunders et al in [ 114 ], employed Transformers to automatically generate 3D human poses from spoken text using a multiple-level configuration. A text-to-gloss-to-pose (T2G2P) network with Transformer layers translated text sentences to sign language glosses and finally to 3D poses, while a text-to-pose (T2P) network directly transformed text into human poses.…”
Section: Sign Language Representationmentioning
confidence: 99%
“…On RWTH-PHOENIX-Weather 2014T, we obtain 22.17 BLEU on testing; on Public DGS Corpus, we obtain a mere 3.2 BLEU. Although Transformers achieve encouraging results on RWTH-PHOENIX-Weather 2014T (Saunders et al, 2020b;, they fail on more realistic, opendomain data. These results reveal that firstly, for real-world applications, we need more data to train such types of models, and secondly, while available data is severely limited in size, less data-hungry and more linguistically-informed approaches may be more suitable.…”
Section: Collect Real-world Datamentioning
confidence: 99%
“…Sign language, a rich visual language with complex grammatical structures, is the language of communication for the Deaf community. To involve the Deaf in the predominantly spoken language of the wider world, a large amount of methods [38][39][40]43] have been recently proposed to tackle the challenging Sign Language Production (SLP) problem. Given a spoken language description, SLP aims to automatically translate it into the corresponding continuous sign sequence.…”
Section: Introductionmentioning
confidence: 99%
“…Given a spoken language description, SLP aims to automatically translate it into the corresponding continuous sign sequence. Generally, sign sequences can be represented as sign skeleton pose sequences [38,40] or sign language videos [39].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation