Proceedings of the 2nd Workshop on Multimedia for Accessible Human Computer Interfaces 2019
DOI: 10.1145/3347319.3356839
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Enhanced Encoder-Decoder Network (SEN) for Video Captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 30 publications
0
1
0
Order By: Relevance
“…In spite of the fact that the essential centre is on trim acknowledgement, the worldly modelling capabilities of LSTM systems adjust with their utilisation in picture captioning, where capturing transient connections between objects in a picture is vital. Guo et al [17] presented a semantic direction arranged for video captioning, emphasizing the part of consideration instruments in improving the quality of produced captions. The consideration instrument makes a difference the show centres on particular districts of intrigued inside the video outlines, making strides in the significance and instruction of the captions.…”
Section: Related Workmentioning
confidence: 99%
“…In spite of the fact that the essential centre is on trim acknowledgement, the worldly modelling capabilities of LSTM systems adjust with their utilisation in picture captioning, where capturing transient connections between objects in a picture is vital. Guo et al [17] presented a semantic direction arranged for video captioning, emphasizing the part of consideration instruments in improving the quality of produced captions. The consideration instrument makes a difference the show centres on particular districts of intrigued inside the video outlines, making strides in the significance and instruction of the captions.…”
Section: Related Workmentioning
confidence: 99%
“…The spatial information from the different regions was passed through a decoder with detailed features. A semantic enhanced encoder-decoder network method with LSTM was used by researchers in [24]. The method presented extracts motion features, appearance features, and global features for video description generations.…”
Section: End-to-end Modelmentioning
confidence: 99%