2020
DOI: 10.1109/tmm.2019.2924576
|View full text |Cite
|
Sign up to set email alerts
|

STAT: Spatial-Temporal Attention Mechanism for Video Captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
129
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 314 publications
(129 citation statements)
references
References 48 publications
0
129
0
Order By: Relevance
“…Descriptors encode important information about the visual characteristics of the objects present in images [34], such as appearance [35,36], motion [37], or geometry [38]. Therefore, they have been used in multiple contexts.…”
Section: Descriptorsmentioning
confidence: 99%
See 2 more Smart Citations
“…Descriptors encode important information about the visual characteristics of the objects present in images [34], such as appearance [35,36], motion [37], or geometry [38]. Therefore, they have been used in multiple contexts.…”
Section: Descriptorsmentioning
confidence: 99%
“…Alioua et al [35] propose a 2D head pose estimation framework using a combination of classic descriptors, e.g., HoG, SURF, and Haar. Yan et al [36] uses two CNN features to model global and local appearance of the target and a 3D CNN which codify the motion. The computational cost of some descriptors could be expensive, e.g., especially those based on deep learning [36], even using parallelization methods [37].…”
Section: Descriptorsmentioning
confidence: 99%
See 1 more Smart Citation
“…Other than this, the multi-view is also well known as multiple angles cameras of 3D model [16]. For heterogeneous multimedia features such as multi-feature hashing of video data with local and global visual features [17], [18] and other orthogonal or associated features, researchers use multi/cross-modal hashing to solve the complexity of the fusion problem of multiple modalities. It is disparate from the multi-view hashing.…”
Section: Introductionmentioning
confidence: 99%
“…The escalation of high specification computers, high-resolution reasonable cameras, and highly dependent video analysis-based applications drive research in object tracking. To date, object tracking is pertinent to the tasks of motion-based recognition, automated surveillance [1], video captioning [2], humancomputer interaction [3], traffic monitoring [4], and autonomous vehicles [5,6].…”
Section: Introductionmentioning
confidence: 99%