2015
DOI: 10.1109/taslp.2015.2405481
|View full text |Cite
|
Sign up to set email alerts
|

Audio-Visual Voice Activity Detection Using Diffusion Maps

Abstract: The performance of traditional voice activity detectors significantly deteriorates in the presence of highly nonstationary noise and transient interferences. One solution is to incorporate a video signal which is invariant to the acoustic environment. Although several voice activity detectors based on the video signal were recently presented, merely few detectors which are based on both the audio and the video signals exist in the literature to date. In this paper, we present an audio-visual voice activity det… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
50
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 52 publications
(50 citation statements)
references
References 41 publications
0
50
0
Order By: Relevance
“…However, due to the challenging problem setting considered in this study, for which the speech is detected at a fine resolution, we found that incorporating the visual information as proposed in [29] does not improve the detection scores. Hence the simulation of [29] is not presented in the plots.…”
Section: Simulation Resultsmentioning
confidence: 80%
See 2 more Smart Citations
“…However, due to the challenging problem setting considered in this study, for which the speech is detected at a fine resolution, we found that incorporating the visual information as proposed in [29] does not improve the detection scores. Hence the simulation of [29] is not presented in the plots.…”
Section: Simulation Resultsmentioning
confidence: 80%
“…In addition, we note that we also compared the proposed algorithm to the algorithm we recently presented in [29]. In [29], we proposed a separate representation of each view and estimated voice activity separately based on each representation; then, we fused the views by merging the estimators. However, due to the challenging problem setting considered in this study, for which the speech is detected at a fine resolution, we found that incorporating the visual information as proposed in [29] does not improve the detection scores.…”
Section: Simulation Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…As a result, the DKS can be viewed as a low-pass filter, which controls the spectral bandwidth. In addition, the DKS can be recast in terms of the diffusion distance, a notion of distance induced by diffusion maps [10] that was shown useful in a broad range of applications, e.g., [36,12,27]. For more details see Diffusion maps.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…Their goal is to reduce the amount of noise pollution by redesigning the air compressor. The work presented in [28] highlights an improved way to collect audio data for voice activity detection. It uses both audio and visual signals with a supervised learning algorithm to detect what audio frames correspond to human voice.…”
Section: Related Workmentioning
confidence: 99%