2021
DOI: 10.1049/ipr2.12239
|View full text |Cite
|
Sign up to set email alerts
|

Crowd activity recognition in live video streaming via 3D‐ResNet and region graph convolution network

Abstract: Since the era of we-media, live video industry has shown an explosive growth trend. For large-scale live video streaming, especially those containing crowd events that may cause great social impact, how to identify and supervise the crowd activity in live video streaming effectively is of great value to push the healthy development of live video industry. The existing crowd activity recognition mainly uses visual information, rarely fully exploiting and utilizing the correlation or external knowledge between c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 31 publications
0
1
0
Order By: Relevance
“…After optimizing the training data to improve the model and increase accuracy, the model can be applied to generate summaries. For video content, the main task is to extract critical frames that represent the main content of the video and use ResNet to convert each keyframe into a feature vector with a fixed length [19]. At the same time, user information processing is carried out, including embedding vectorization processing of elements, such as timestamps, address labels, and network IP addresses, and merging each keyframe and audio feature vector into a user token comprehensive feature vector, which contains multimodal content and user information summary information.…”
Section: Identity Managementmentioning
confidence: 99%
“…After optimizing the training data to improve the model and increase accuracy, the model can be applied to generate summaries. For video content, the main task is to extract critical frames that represent the main content of the video and use ResNet to convert each keyframe into a feature vector with a fixed length [19]. At the same time, user information processing is carried out, including embedding vectorization processing of elements, such as timestamps, address labels, and network IP addresses, and merging each keyframe and audio feature vector into a user token comprehensive feature vector, which contains multimodal content and user information summary information.…”
Section: Identity Managementmentioning
confidence: 99%