2018
DOI: 10.1109/tmm.2018.2839534
|View full text |Cite
|
Sign up to set email alerts
|

Fully Convolutional Network for Multiscale Temporal Action Proposals

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(3 citation statements)
references
References 29 publications
0
3
0
Order By: Relevance
“…1) Similarity of inter-categorical actions: Multiple Instance Learning: The idea of extracting pieces of the input as proposals to then in a second stage decide which of these proposals are indeed classified as positive has been widely used in object detection [67], [32], [49], [50], [82], [74] and action detection [11], [30], [90], [19], [34]. The main goal of the proposals extraction is to filter as much as possible the relevant and non-relevant information by identifying the negative parts of the sample (i.e.,background in the case of object detection and non-action in the case of action detection), in order to be discarded for the following classification stage.…”
Section: A Generation Of Action Proposalsmentioning
confidence: 99%
“…1) Similarity of inter-categorical actions: Multiple Instance Learning: The idea of extracting pieces of the input as proposals to then in a second stage decide which of these proposals are indeed classified as positive has been widely used in object detection [67], [32], [49], [50], [82], [74] and action detection [11], [30], [90], [19], [34]. The main goal of the proposals extraction is to filter as much as possible the relevant and non-relevant information by identifying the negative parts of the sample (i.e.,background in the case of object detection and non-action in the case of action detection), in order to be discarded for the following classification stage.…”
Section: A Generation Of Action Proposalsmentioning
confidence: 99%
“…Temporal convolution neural network is a common method to model sequential information [30][31][32][33]. Convolution layer is demonstrated implicitly to learn absolute position information from the commonly used padding operation [22].…”
Section: Position Encoding In Convolutionmentioning
confidence: 99%
“…It leads to an important yet challenging task for video analysis: Temporal Action Localization (TAL), which requires to not only classify the untrimmed videos into specific categories accurately, but also locate the temporal boundaries of action instances precisely. Although substantial progress has been achieved on this task [41], [26], [39], [16], [6], [18], [10], [9], it is still limited for industrial applications due to the huge amount of temporal annotations used for training such a deep learning based model in a fully-supervised manner, which are laborintensive to annotate especially for a large-scale dataset. On the contrary, weak labels such as video-level labels are much easier to obtain, hence many current works try to handle this problem under weak supervision.…”
Section: Introductionmentioning
confidence: 99%