2022
DOI: 10.1109/tcsvt.2021.3131721
|View full text |Cite
|
Sign up to set email alerts
|

POS-Trends Dynamic-Aware Model for Video Caption

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(3 citation statements)
references
References 61 publications
0
3
0
Order By: Relevance
“…This is achieved by generating correct words and providing direction for decoding. As shown by several MSVD, MSR-VTT, and VATEX tests, our approach is superior to the most recent and cutting-edge methods for solving BLEU-4, ROUGE-L, METEOR, and CIDEr [40].…”
Section: Literature Reviewmentioning
confidence: 80%
“…This is achieved by generating correct words and providing direction for decoding. As shown by several MSVD, MSR-VTT, and VATEX tests, our approach is superior to the most recent and cutting-edge methods for solving BLEU-4, ROUGE-L, METEOR, and CIDEr [40].…”
Section: Literature Reviewmentioning
confidence: 80%
“…Researchers [78−85] usually use attention mechanisms in the temporal and spatial (regional) dimensions. Yao et al [86] tried to fuse different visual information in different frames based on the temporal attention module. Some methods [80,83] use a spatial attention mechanism to enhance the important parts within each frame.…”
Section: Methodsmentioning
confidence: 99%
“…Next, each color channel was bandpass filtered using a Butterworth filter (0.75-2.75 Hz; 6th order). We then dimension-reduced the RGB array, initially representing changes in average 3D RGB color space, to a single array, eventually representing only heart-beat-related fluctuations, using the plane-orthogonal-to-skin algorithm (POS; Wang, Li et al 2022)) with a sliding window size of 1.6 s. This final array was converted to the frequency domain using a time-frequency analysis based on Lomb-Scargle periodogram calculations per sliding window of 10 s with a temporal resolution of 240 points and frequency resolution of 120 points. The resulting power density functions, represented in a 2D (240 by 120) data array, were converted to signal-to-noise ratios (i.e., SNR; also termed coherence) per time point by dividing each power value by the sum of all absolute power values.…”
Section: Rppg Extractionmentioning
confidence: 99%