2021
DOI: 10.1016/j.neucom.2021.04.072
|View full text |Cite
|
Sign up to set email alerts
|

D-MmT: A concise decoder-only multi-modal transformer for abstractive summarization in videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 7 publications
0
6
0
Order By: Relevance
“…HA [1] is a baseline multiencoder‐decoder model with a hierarchical attention strategy in the decoder part; MFFG [3] is a multi‐stage fusion architecture to fuse multi‐source data, which suppresses the flow of multimodal noise via a forget gate module; D ‐ MmT [2] is a decoder‐only multimodal transformer framework for video‐containing the MAS task.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…HA [1] is a baseline multiencoder‐decoder model with a hierarchical attention strategy in the decoder part; MFFG [3] is a multi‐stage fusion architecture to fuse multi‐source data, which suppresses the flow of multimodal noise via a forget gate module; D ‐ MmT [2] is a decoder‐only multimodal transformer framework for video‐containing the MAS task.…”
Section: Methodsmentioning
confidence: 99%
“…Shang et al [27] introduced a novel short-term order-sensitive attention mechanism to leverage the time clue inside video frames. Liu et al [2] proposed reducing model parameters with a decoder-only multimodal transformer which combines the source inputs and target summary in the shared feature space.…”
Section: Multimodal Abstractive Summarisationmentioning
confidence: 99%
See 2 more Smart Citations
“…Liu et al [ 22 ] have introduced a Decoder-only Multimodal Transformer (D-MmT), which is modified from the structure of the decoder by including the in-out multimodal decoder. Moreover, Cascaded Cross-Modal interaction (CXMI) creates the joint fusion representation among the modalities.…”
Section: Related Workmentioning
confidence: 99%