2018
DOI: 10.1007/978-3-030-01231-1_19
|View full text |Cite
|
Sign up to set email alerts
|

Recurrent Tubelet Proposal and Recognition Networks for Action Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
78
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 113 publications
(80 citation statements)
references
References 32 publications
2
78
0
Order By: Relevance
“…More recently the method proposed by Khan and Borji [27] used a fine-tuned version of RefineNet [16] in conjunction with Conditional Random Fields to achieve pixel-level hand segmentation and used the segmentation masks later with AlexNet for ego hand activity detection. Li et al [14] proposed the concept of recurrent tubelets proposal and recognition.In this approach the current area related to hand is extracted based on its previous location recurrently, and features are calculated on this extracted area. These features are then fed into a separate network for recognising gestures.…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation
“…More recently the method proposed by Khan and Borji [27] used a fine-tuned version of RefineNet [16] in conjunction with Conditional Random Fields to achieve pixel-level hand segmentation and used the segmentation masks later with AlexNet for ego hand activity detection. Li et al [14] proposed the concept of recurrent tubelets proposal and recognition.In this approach the current area related to hand is extracted based on its previous location recurrently, and features are calculated on this extracted area. These features are then fed into a separate network for recognising gestures.…”
Section: Previous Workmentioning
confidence: 99%
“…These features are then fed into a separate network for recognising gestures. In all the above approaches [1,33,27,14], features were calculated on the extracted ego hand masks and then provided as an input to a different recognition system. Instead in our approach, we calculate features which can be both used for ego hand mask generation and ego gesture recognition simultaneously and also giving our network architecture ability to train end-to-end which wasn't possible in earlier approaches.…”
Section: Previous Workmentioning
confidence: 99%
“…T-CNN [12] and ACT [17] improve [28,35] by modeling short-term temporal information. RTPR [24] exploring long-term temporal dynamics with LSTM further boosts up the performance. [40] leads to video-mAP gain by modeling the relation between human and global context, but still yields inferior performance to our LSTR.…”
Section: Comparison With State-of-the-artmentioning
confidence: 99%
“…The first kind is video summarization methods [32,58], which generate a short synopsis for a long video. The second kind of methods [7,8,13,14,19,22,31,37,41] try to trim the video segment of interest. Using natural language as a query, [14,19] retrieve a specific temporal segment in a video, which shares the same semantic meaning as the query.…”
Section: Introductionmentioning
confidence: 99%