2020
DOI: 10.1007/978-3-030-58580-8_9
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Localize Actions from Moments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

3
4

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 45 publications
0
6
0
Order By: Relevance
“…In addition, the supervision signal should not be limited to instance-or video-level labels. For example, some works [211], [212] first employ trimmed videos from video recognition benchmarks to learn action patterns, then localize action instances in untrimmed videos. Furthermore, the exploration of multiple modalities within video data is essential.…”
Section: Further Discussion and Promising Directionsmentioning
confidence: 99%
“…In addition, the supervision signal should not be limited to instance-or video-level labels. For example, some works [211], [212] first employ trimmed videos from video recognition benchmarks to learn action patterns, then localize action instances in untrimmed videos. Furthermore, the exploration of multiple modalities within video data is essential.…”
Section: Further Discussion and Promising Directionsmentioning
confidence: 99%
“…The datasets commonly used for temporal action detection are mainly THUMOS14 [75], MEX-action2 [76], and ActivityNet [77]. The THUMOS14 dataset includes an action recognition part and a temporal action detection part.…”
Section: Action Detection 31 Action Detection Datasetsmentioning
confidence: 99%
“…Early approaches usually rely on hand-crafted features, which detect spatio-temporal interest points and then describe these points with local representations [45,46]. With the tremendous success of deep convolution networks on image-based classification tasks [12,35,38,41], researchers started to explore the application of deep networks on video action recognition task [7,18,29,30,54]. In [37], the famous twostream architecture is devised by applying two 2D CNN architectures separately on visual frames and staked opti-cal flows.…”
Section: Related Workmentioning
confidence: 99%