2023
DOI: 10.1109/tnnls.2019.2962815
|View full text |Cite
|
Sign up to set email alerts
|

AdapNet: Adaptability Decomposing Encoder–Decoder Network for Weakly Supervised Action Recognition and Localization

Abstract: The point process is a solid framework to model sequential data, such as videos, by exploring the underlying relevance. As a challenging problem for high-level video understanding, weakly supervised action recognition and localization in untrimmed videos has attracted intensive research attention. Knowledge transfer by leveraging the publicly available trimmed videos as external guidance is a promising attempt to make up for the coarse-grained video-level annotation and improve the generalization performance. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
17
0

Publication Types

Select...
10

Relationship

0
10

Authors

Journals

citations
Cited by 39 publications
(17 citation statements)
references
References 58 publications
0
17
0
Order By: Relevance
“…With the rapid development of deep learning, many deep neural networks have been proposed, such as AlexNet [39], ZF [40], VGG [36], GoogleNet [41], and ResNet [42], etc. Different network frameworks and learning strategies had been designed in some researches, which effectively improve the generalization ability of the network models [43]- [45]. By designing different numbers of weight layers, networks with different depths can be built by these networks.…”
Section: Introductionmentioning
confidence: 99%
“…With the rapid development of deep learning, many deep neural networks have been proposed, such as AlexNet [39], ZF [40], VGG [36], GoogleNet [41], and ResNet [42], etc. Different network frameworks and learning strategies had been designed in some researches, which effectively improve the generalization ability of the network models [43]- [45]. By designing different numbers of weight layers, networks with different depths can be built by these networks.…”
Section: Introductionmentioning
confidence: 99%
“…All the previously mentioned works propose fully supervised approaches. Since it is more complicated each day to have labels for such big amount of videos, the community is exploring also weakly supervised alternatives [22]- [29]. In any case, our analysis focuses on the different problem of online action detection.…”
Section: Related Workmentioning
confidence: 99%
“…In particular, they aggregated activations of each layer and concatenated them into the final representation, which achieved satisfactory results. The research of [26] and [27] also showed that visual recognition tasks make a considerable difference, which needs to be considered in the process of constructing a depth model. For example, during the construction of the action recognition model of [28], the adaptability of the model to a weak supervised dataset was taken into account.…”
Section: Related Workmentioning
confidence: 99%