2023
DOI: 10.1049/cvi2.12173
|View full text |Cite
|
Sign up to set email alerts
|

MCR: Multilayer cross‐fusion with reconstructor for multimodal abstractive summarisation

Abstract: Multimodal abstractive summarisation (MAS) aims to generate a textual summary from multimodal data collection, such as video‐text pairs. Despite the success of recent work, the existing methods lack a thorough analysis for consistency across multimodal data. Besides, previous work relies on the fusion method to extract multimodal semantics, neglecting the constraints for complementary semantics of each modality. To address those issues, a multilayer cross‐fusion model with the reconstructor for the MAS task is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 50 publications
0
1
0
Order By: Relevance
“…Yuan et al [ 20 ] have introduced the Multi-Layer cross-fusion with a Re-constructor (MCR) to create a textual summary from the multimodal video collection. The MCR performs cross-fusion through the layer blocks of cross-model transformers and it results in a cross-modal representation.…”
Section: Related Workmentioning
confidence: 99%
“…Yuan et al [ 20 ] have introduced the Multi-Layer cross-fusion with a Re-constructor (MCR) to create a textual summary from the multimodal video collection. The MCR performs cross-fusion through the layer blocks of cross-model transformers and it results in a cross-modal representation.…”
Section: Related Workmentioning
confidence: 99%