2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.332
|View full text |Cite
|
Sign up to set email alerts
|

SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition

Abstract: We propose a novel deep learning approach to solve simultaneous alignment and recognition problems (referred to as "Sequence-to-sequence" learning). We decompose the problem into a series of specialised expert systems referred to as SubUNets. The spatio-temporal relationships between these SubUNets are then modelled to solve the task, while remaining trainable end-to-end.The approach mimics human learning and educational techniques, and has a number of significant advantages. SubUNets allow us to inject domain… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
124
0
1

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 261 publications
(125 citation statements)
references
References 28 publications
0
124
0
1
Order By: Relevance
“…As future work, it would be interesting to extend the attention mechanisms to the spatial domain to align building blocks of signs, also known as subunits, with their spoken language translations. It may also be possible to use an approach similar to SubUNets [6] to inject specialist intermediate subunit knowledge, bridging the gap between S2T and S2G2T.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…As future work, it would be interesting to extend the attention mechanisms to the spatial domain to align building blocks of signs, also known as subunits, with their spoken language translations. It may also be possible to use an approach similar to SubUNets [6] to inject specialist intermediate subunit knowledge, bridging the gap between S2T and S2G2T.…”
Section: Resultsmentioning
confidence: 99%
“…Until recently SLR methods have mainly used handcrafted intermediate representations [33,16] and the temporal changes in these features have been modelled using classical graph based approaches, such as Hidden Markov Models (HMMs) [58], Conditional Random Fields [62] or template based methods [5,48]. However, with the emergence of DL, SLR researchers have quickly adopted Convolutional Neural Networks (CNNs) [40] for manual [35,37] and non-manual [34] feature representation, and Recurrent Neural Networks (RNNs) for temporal modelling [6,36,17].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Some research studies are already investigating continuous sign language translation [59][60][61]. Our future plan is to embed our system into a continuous sign language translation system.…”
Section: Resultsmentioning
confidence: 99%
“…Such applications include Continuous Sign Language Recognition [4] and Video Captioning [7]. To be able to train spatio-temporal deep networks using sequence level annotations, researchers adopted sequence-to-sequence learning methods from other fields, namely Connectionist Temporal Classification [18] from Speech Recognition [19] and Encoder-Decoder Networks [8] from the field of Neural Machine Translations [1].…”
Section: Introductionmentioning
confidence: 99%