2020 25th International Conference on Pattern Recognition (ICPR) 2021
DOI: 10.1109/icpr48806.2021.9413097
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Multimodal Attention for Deep Video Summarization

Abstract: The way people consume sports on TV has drastically evolved in the last years, particularly under the combined effects of the legalization of sport betting and the huge increase of sport analytics. Several companies are nowadays sending observers in the stadiums to collect live data of all the events happening on the field during the match. Those data contain meaningful information providing a very detailed description of all the actions occurring during the match to feed the coaches and staff, the fans, the v… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 16 publications
(9 citation statements)
references
References 46 publications
0
9
0
Order By: Relevance
“…Fig. 4 illustrates the concept of our fusion mechanism, inspired by the method presented in [17], that fuses the information from both modalities in a hierarchical fashion. First, we feed each video (u V t ) T t=1 and audio (u A t ) T t=1 embedded sequences to separated 128-dim BiGRU layers with hidden state h {V,A} t…”
Section: Cross-modality Fusionmentioning
confidence: 99%
See 4 more Smart Citations
“…Fig. 4 illustrates the concept of our fusion mechanism, inspired by the method presented in [17], that fuses the information from both modalities in a hierarchical fashion. First, we feed each video (u V t ) T t=1 and audio (u A t ) T t=1 embedded sequences to separated 128-dim BiGRU layers with hidden state h {V,A} t…”
Section: Cross-modality Fusionmentioning
confidence: 99%
“…In [16,17], the λ weights are learnt using perceptrons that fully-connect, at any given time, the hidden representation of each modality. This paper presents a novel approach to compute the weights of the modalities using: (1) an estimation of the uncertainty of both video and audio embedded representations, and (2) self-attention to measure the importance of video and audio modalities in their local temporal context.…”
Section: Cross-modality Fusionmentioning
confidence: 99%
See 3 more Smart Citations