2021
DOI: 10.3390/electronics10212654
|View full text |Cite
|
Sign up to set email alerts
|

Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping

Abstract: In the process of violence recognition, accuracy is reduced due to problems related to time axis misalignment and the semantic deviation of multimedia visual auditory information. Therefore, this paper proposes a method for auditory-visual information fusion based on autoencoder mapping. First, a feature extraction model based on the CNN-LSTM framework is established, and multimedia segments are used as whole input to solve the problem of time axis misalignment of visual and auditory information. Then, a share… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(1 citation statement)
references
References 29 publications
0
1
0
Order By: Relevance
“…Ref [ 24 ] introduces a semi-supervised approach into pre-trained I3D, which can improve accuracy by removing redundant data and focusing on useful visual information. Ref [ 38 ] uses CNN-LSTM to extract visual and auditory information simultaneously, then, a shared semantic subspace is constructed based on an autoencoder mapping model, which can fuse segment level features.…”
Section: Introductionmentioning
confidence: 99%
“…Ref [ 24 ] introduces a semi-supervised approach into pre-trained I3D, which can improve accuracy by removing redundant data and focusing on useful visual information. Ref [ 38 ] uses CNN-LSTM to extract visual and auditory information simultaneously, then, a shared semantic subspace is constructed based on an autoencoder mapping model, which can fuse segment level features.…”
Section: Introductionmentioning
confidence: 99%