2023
DOI: 10.1049/ipr2.12819
|View full text |Cite
|
Sign up to set email alerts
|

Dense video captioning based on local attention

Abstract: Dense video captioning aims to locate multiple events in an untrimmed video and generate captions for each event. Previous methods experienced difficulties in establishing the multimodal feature relationship between frames and captions, resulting in low accuracy of the generated captions. To address this problem, a novel Dense Video Captioning Model Based on Local Attention (DVCL) is proposed. DVCL employs a 2D temporal differential CNN to extract video features, followed by feature encoding using a deformable… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 39 publications
0
0
0
Order By: Relevance