2010
DOI: 10.1049/iet-ipr.2009.0042
|View full text |Cite
|
Sign up to set email alerts
|

Visual voice activity detection with optical flow

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
21
0
3

Year Published

2011
2011
2020
2020

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(24 citation statements)
references
References 6 publications
0
21
0
3
Order By: Relevance
“…For example, in a cocktail party scenario, looking at the speaker's face, or more precisely the movements of the lip region, helps one to comprehend the speech of interest. The bimodal coherence of audio and visual stimuli was shown to be useful for voice activity detection [7], [8], [9]. However, the above visual VAD algorithms use either only static or only dynamic features.…”
Section: Introductionmentioning
confidence: 99%
“…For example, in a cocktail party scenario, looking at the speaker's face, or more precisely the movements of the lip region, helps one to comprehend the speech of interest. The bimodal coherence of audio and visual stimuli was shown to be useful for voice activity detection [7], [8], [9]. However, the above visual VAD algorithms use either only static or only dynamic features.…”
Section: Introductionmentioning
confidence: 99%
“…Visual VAD has potential applications in noise reduction, speech separation or extraction and speech recognition. Some visual VAD algorithms have been proposed [1], [2], [3]. The method in [1] first projects the mouth region into principal component space, then models silent and non-silent periods with a single Gaussian distribution and a Gaussian mixture distribution respectively for the decision rule.…”
Section: Introductionmentioning
confidence: 99%
“…The algorithm in [2] uses a filtered dynamic visual feature calculated from geometric visual features with multi-thresholds for silence detection. The approach in [3] estimates lip motion based on complex discrete wavelet transform, then applies the hidden Markov model for the statistical characterization of the lip motion, which is finally thresholded for the VAD. However, the algorithms above use either only static or only dynamic features, and the features used are fixed for different objects (speakers).…”
Section: Introductionmentioning
confidence: 99%
“…영상신호를 이용하는 연구는 주로 입술의 움직임을 이용 하는 것이며 [8] , 음성과 영상을 함께 이용하는 멀티모달 시 스템이 점차적으로 확산됨에 따라 음향잡음이 심한 환경에 서 영상신호를 이용하여 음성구간 검출 성능을 향상시키려 는 시도도 많이 이루어지고 있다 [9][10][11][12][13][14] . 영상신호를 이용한 음성구간 검출 알고리즘은 특징값을 추출하는 방법과 추출 한 특징값을 이용하여 음성 비음성을 판별하는 방법에 따 라 여러 방식의 알고리즘이 제안되어 왔다.…”
unclassified
“…Navarathna 등은 영상정보를 이용한 음성 인식에서 사용되었던 일련의 변환법을 사용하여 음성구간 검출에 적용하였는데 [13] , 여기서는 정적 특징값 뿐만 아니 라 동적 특징값을 사용하였고 가우시안 믹스처 모델링을 통해 음성/비음성 구간을 판별하였다. 또한, Aubrey 등은 입술의 움직임을 이용하여 음성구간을 검출하려고 시도하 였는데, 각 영상 프레임에서 구한 옵티컬 플로우의 변화를 HMM을 이용하여 모델링하였다 [14] . [14,15] .…”
unclassified