2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.412
|View full text |Cite
|
Sign up to set email alerts
|

Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled

Abstract: This work presents a new approach to learning a framebased classifier on weakly labelled sequence data by embedding a CNN within an iterative EM algorithm. This allows the CNN to be trained on a vast number of example images when only loose sequence level information is available for the source videos. Although we demonstrate this in the context of hand shape recognition, the approach has wider application to any video recognition task where frame level labelling is not available. The iterative EM algorithm le… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
164
0
10

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 224 publications
(174 citation statements)
references
References 39 publications
0
164
0
10
Order By: Relevance
“…As shown in Table 3, our Hand SubUNet surpasses the hand shape recognition performance of the state-of-the-art CNN-based method proposed by Koller et al [27], by a margin of 18% Top-1 accuracy, which is a relative improvement of 30%. Koller et al [27] iteratively realigned and retrained his network whereas the SubUNet architecture automatically overcomes the frame alignment issue.…”
Section: Hand Subunet: End-to-end Hand Shape Recognition and Alignmentmentioning
confidence: 91%
See 3 more Smart Citations
“…As shown in Table 3, our Hand SubUNet surpasses the hand shape recognition performance of the state-of-the-art CNN-based method proposed by Koller et al [27], by a margin of 18% Top-1 accuracy, which is a relative improvement of 30%. Koller et al [27] iteratively realigned and retrained his network whereas the SubUNet architecture automatically overcomes the frame alignment issue.…”
Section: Hand Subunet: End-to-end Hand Shape Recognition and Alignmentmentioning
confidence: 91%
“…As frame level annotations are hard to come by in continuous datasets, most of the work to date required an alignment step to localize individual signs in videos [10]. The work that is most relevant to this paper is by Koller et al [27] which combines deep-representations with traditional HMM based temporal modelling.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In [24], Koller et al propose a CNN-HMM hybrid that learns to localize and recognize hand shapes. They first train a CNN using weak frame level annotations.…”
Section: Introductionmentioning
confidence: 99%