2020
DOI: 10.1007/978-3-030-58526-6_43
|View full text |Cite
|
Sign up to set email alerts
|

Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
45
0
2

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 92 publications
(47 citation statements)
references
References 28 publications
0
45
0
2
Order By: Relevance
“…This again demonstrates the superiority of the proposed feature enhancement network. (Paul, Roy, and Roy-Chowdhury 2018) 37.0 12.7 1.5 18.0 CleanNet 37.1 20.3 5.0 21.6 Liu et al (Liu, Jiang, and Wang 2019) 36.8 22.9 5.6 22.4 TSM (Yu et al 2019) 28.3 17.0 3.5 17.1 RPN 37.6 23.9 5.4 23.3 Gong et al (Gong et al 2020) 40.0 25.0 4.6 24.6 EM-MIL (Luo et al 2020) 37.4 --20.3 TSCN (Zhai et al 2020) 37.6 23.7 5.7 23.6 BaS-Net (Lee, Uh, and Byun 2020) 38.5 24.2 5.6 24.3 BaS-Net* (Lee, Uh, and Byun 2020) 36.9 23.3 5.1 22.4 +ACGNet 40.8 +3.9 25.3 +2.0 5.6 +0.5 25.1 +2.7 DGAM (Shi et al 2020) 41.0 23.5 5.3 24.4 DGAM* (Shi et al 2020) 40.3 23.2 5.0 24.0 +ACGNet 41.4 +1.1 24.2 +1.0 5.5 +0.5 24.9 +0.9 BaM (Lee et al 2021) 41.2 25.6 6.0 25.9 BaM* (Lee et al 2021) 40.8 24.9 5.8 25.6 +ACGNet 41.8 +1.0 26.0 +1.1 5.9 +0.1 26.1 +0.5…”
Section: Comparison To State-of-the-art Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This again demonstrates the superiority of the proposed feature enhancement network. (Paul, Roy, and Roy-Chowdhury 2018) 37.0 12.7 1.5 18.0 CleanNet 37.1 20.3 5.0 21.6 Liu et al (Liu, Jiang, and Wang 2019) 36.8 22.9 5.6 22.4 TSM (Yu et al 2019) 28.3 17.0 3.5 17.1 RPN 37.6 23.9 5.4 23.3 Gong et al (Gong et al 2020) 40.0 25.0 4.6 24.6 EM-MIL (Luo et al 2020) 37.4 --20.3 TSCN (Zhai et al 2020) 37.6 23.7 5.7 23.6 BaS-Net (Lee, Uh, and Byun 2020) 38.5 24.2 5.6 24.3 BaS-Net* (Lee, Uh, and Byun 2020) 36.9 23.3 5.1 22.4 +ACGNet 40.8 +3.9 25.3 +2.0 5.6 +0.5 25.1 +2.7 DGAM (Shi et al 2020) 41.0 23.5 5.3 24.4 DGAM* (Shi et al 2020) 40.3 23.2 5.0 24.0 +ACGNet 41.4 +1.1 24.2 +1.0 5.5 +0.5 24.9 +0.9 BaM (Lee et al 2021) 41.2 25.6 6.0 25.9 BaM* (Lee et al 2021) 40.8 24.9 5.8 25.6 +ACGNet 41.8 +1.0 26.0 +1.1 5.9 +0.1 26.1 +0.5…”
Section: Comparison To State-of-the-art Methodsmentioning
confidence: 99%
“…For example, P-GCN (Zeng et al 2019) constructs a graph according to the distances and IoUs between proposals, aiming to adjust the category and boundary of each proposal by using context information. G-TAD (Xu et al 2020) attempts to make use of not only temporal context, but also semantic context captured through graph convolutional networks (GCN), and then temporal action detection is cast as a sub-graph localization problem. GTRM (Huang, Sugano, and Sato 2020) employs GCN to integrate all the action segments within a certain period of time in the action segmentation task.…”
Section: Related Workmentioning
confidence: 99%
“…Weakly supervised temporal action localization provides an efficient way to detect the action instances without overload annotations. Many works mainly tackle this problem using the multiple-instance learning (MIL) framework [12,13,19,20,24,25,32]. Several works [13,35] mainly aggregate snippet-level class scores to produce video-level predictions and learn from video-level action labels.…”
Section: Related Workmentioning
confidence: 99%
“…However, these fully supervised methods require extensive manual frame/snippet level annotations. To address this problem, many weakly supervised temporal action localization (WS-TAL) methods [13,14,25,38,50] are proposed to explore an efficient way to detect the action instances in the given videos with only video-level supervision which is more easily obtained by the annotator.…”
Section: Introductionmentioning
confidence: 99%
“…A variety of applications are possible using this approach. Recent applications include a progressive learning framework for weakly supervised object detection [25], localization of action segments in video [26], and algorithms within the fields of geoscience and remote sensing [27], [28]. The approach has also been adopted in medical imaging.…”
Section: Related Workmentioning
confidence: 99%