2019
DOI: 10.48550/arxiv.1911.05449
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Crowd Video Captioning

Liqi Yan,
Mingjian Zhu,
Changbin Yu

Abstract: Describing a video automatically with natural language is a challenging task in the area of computer vision. In most cases, the on-site situation of great events is reported in news, but the situation of the off-site spectators in the entrance and exit is neglected which also arouses people's interest. Since the deployment of reporters in the entrance and exit costs lots of manpower, how to automatically describe the behavior of a crowd of off-site spectators is significant and remains a problem. To tackle thi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 14 publications
0
1
0
Order By: Relevance
“…CVC (Yan et al 2010) proposed a system using the ED approach to describe numerous characteristics of off-site viewers or an audience's crowd (such as the number of people in the crowd), the movement conditions, and the flow direction. The model employs a 2D/3D CNN for crowd feature extraction from video, which then feeds into an LSTM-GRU-based language model for captioning.…”
Section: Cnn-cnnmentioning
confidence: 99%
“…CVC (Yan et al 2010) proposed a system using the ED approach to describe numerous characteristics of off-site viewers or an audience's crowd (such as the number of people in the crowd), the movement conditions, and the flow direction. The model employs a 2D/3D CNN for crowd feature extraction from video, which then feeds into an LSTM-GRU-based language model for captioning.…”
Section: Cnn-cnnmentioning
confidence: 99%