2022
DOI: 10.48550/arxiv.2205.07611
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Noise-Tolerant Learning for Audio-Visual Action Recognition

Abstract: Recently, video recognition is emerging with the help of multi-modal learning, which focuses on integrating multiple modalities to improve the performance or robustness of a model. Although various multi-modal learning methods have been proposed and offer remarkable recognition results, almost all of these methods rely on high-quality manual annotations and assume that modalities among multi-modal data provide relevant semantic information. Unfortunately, most widely used video datasets are collected from the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 36 publications
0
1
0
Order By: Relevance
“…Nowadays, audio-visual fusion has become one of the most important solutions to solve the problem of noise interference [42,43]. The performance of some audio-visual speech recognition models has surpassed human capabilities.…”
Section: Audio-visual Fusionmentioning
confidence: 99%
“…Nowadays, audio-visual fusion has become one of the most important solutions to solve the problem of noise interference [42,43]. The performance of some audio-visual speech recognition models has surpassed human capabilities.…”
Section: Audio-visual Fusionmentioning
confidence: 99%