2021
DOI: 10.48550/arxiv.2107.03783
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Use of Affective Visual Information for Summarization of Human-Centric Videos

Abstract: Increasing volume of user-generated human-centric video content and their applications, such as video retrieval and browsing, require compact representations that are addressed by the video summarization literature. Current supervised studies formulate video summarization as a sequence-to-sequence learning problem and the existing solutions often neglect the surge of human-centric view, which inherently contains affective content. In this study, we investigate the affective-information enriched supervised vide… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 40 publications
(86 reference statements)
0
2
0
Order By: Relevance
“…Ji et al [ 16 ] solve the problem of short-term contextual attention insufficiency and distribution inconsistency. Köprü [ 17 ] proposes two new architectures based on temporal attention (TA-AVSUM) and spatial attention (SA-AVSUM).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Ji et al [ 16 ] solve the problem of short-term contextual attention insufficiency and distribution inconsistency. Köprü [ 17 ] proposes two new architectures based on temporal attention (TA-AVSUM) and spatial attention (SA-AVSUM).…”
Section: Related Workmentioning
confidence: 99%
“…L cro is the loss function of cross-attention. Both L mat and L dat are cross-entropy, as defined by Equation (17). To resolve the centralization issue and reduce the ambiguity problem in key frame filtering, we use L cen for centralization keyframe scores:…”
Section: Loss Functionmentioning
confidence: 99%