2022
DOI: 10.1109/taslp.2022.3153265
|View full text |Cite
|
Sign up to set email alerts
|

Improved Lite Audio-Visual Speech Enhancement

Abstract: Numerous studies have investigated the effectiveness of audio-visual multimodal learning for speech enhancement (AVSE) tasks, seeking a solution that uses visual data as auxiliary and complementary input to reduce the noise of noisy speech signals. Recently, we proposed a lite audio-visual speech enhancement (LAVSE) algorithm for a car-driving scenario. Compared to conventional AVSE systems, LAVSE requires less online computation and to some extent solves the user privacy problem on facial data. In this study,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 93 publications
0
5
0
1
Order By: Relevance
“…In this section, we compare the experimental results of the proposed AVSE model with those of other baseline SE models, including two AVSE models [4,23] and one traditional audio-only methods, namely LogMMSE [24]. We conducted an objective comparison with two standardized evaluation metrics that are widely used to evaluate SE performance-the perceptual evaluation of speech quality (PESQ) [25] and short-time objective intelligibility measure (STOI) [26].…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…In this section, we compare the experimental results of the proposed AVSE model with those of other baseline SE models, including two AVSE models [4,23] and one traditional audio-only methods, namely LogMMSE [24]. We conducted an objective comparison with two standardized evaluation metrics that are widely used to evaluate SE performance-the perceptual evaluation of speech quality (PESQ) [25] and short-time objective intelligibility measure (STOI) [26].…”
Section: Resultsmentioning
confidence: 99%
“…For the audio components, we followed the setup described in [4]. To form clean-noisy speech pairs for training, the utterances were artificially corrupted by 100 types of noise [20] at five different signal-to-noise ratios (SNRs) ranging from −12 to 12 dB with an increment of 6 dB.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Chuang, S.Y., et al [12] proposed an improved lite audio-visual speech enhancement (iLAVSE) algorithm for a car-driving scenario. Three stages are involved in the iLAVSE system: data preprocessing, AVSE based on CRNN, and reconstruction.…”
Section: Literature Surveymentioning
confidence: 99%