2001
DOI: 10.1109/97.957270
|View full text |Cite
|
Sign up to set email alerts
|

Analysis and improvement of a statistical model-based voice activity detector

Abstract: From investigation of a statistical model-based voice activity detector (VAD), it is found that the likelihood ratio defined in the VAD has a fundamental problem at the offset regions of speech signals. Thus, we analyze the behavioral mechanism of the likelihood ratio, identify the reason for the unwanted phenomenon, and propose a solution based on a smoothed likelihood ratio. Objective test results show that the proposed method gives significant improvement to the original VAD. Additionally, the improved VAD … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2005
2005
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 84 publications
(5 citation statements)
references
References 3 publications
0
5
0
Order By: Relevance
“…We consider two different pdfs as the candidate distributions for statistical modeling. The first one is the complex Gaussian pdf which is most widely applied to characterize the DFT coefficients' distribution in speech analysis [1,2,3,5,6,8,9,10,11,12] . With the Gaussian pdf assumption, the distributions of the noisy spectral components conditioned on both hypotheses are given by:…”
Section: Statistical Models For Noisy Speechmentioning
confidence: 99%
See 1 more Smart Citation
“…We consider two different pdfs as the candidate distributions for statistical modeling. The first one is the complex Gaussian pdf which is most widely applied to characterize the DFT coefficients' distribution in speech analysis [1,2,3,5,6,8,9,10,11,12] . With the Gaussian pdf assumption, the distributions of the noisy spectral components conditioned on both hypotheses are given by:…”
Section: Statistical Models For Noisy Speechmentioning
confidence: 99%
“…This algorithm has shown a high detection accuracy compared with the conventional algorithms. The statistical model-based method has been further improved with the incorporation of the soft decision scheme [2,3] . Most of the conventional VAD algorithms which mainly operate in the discrete Fourier transform (DFT) domain assume that the clean speech and noise spectra are characterized by Gaussian distributions.…”
Section: Introductionmentioning
confidence: 99%
“…In face‐to‐face collaboration, spoken interactions dominate the ways in which students demonstrate these collaborative activities. Capturing audio data from these natural group interactions provides a more in‐the‐moment description of student interactions compared to thoroughly thought about responses found in text‐based data (Cho & Kondoz, 2001; Donnelly et al., 2017).…”
Section: Introductionmentioning
confidence: 99%
“…As the demands for more accurate VADs in noisy conditions increase, a lot of efforts have been made to enhance the performance of VAD [2][3][4][5][6][7][8][9][10][11][12][13][14]. One successful approach is the statistical model-based VAD (SMVAD) proposed by Sohn et al [2].…”
Section: Introductionmentioning
confidence: 99%
“…More recently, various efforts have been made to optimize SMVAD by modifying the decision rule originally derived from the likelihood ratio test (LRT). To decrease detection errors at speech offset regions, Sohn et al [3] proposed an effective hang-over scheme based on the hidden Markov model (HMM), and Cho and Kondoz [4] proposed smoothed likelihood ratios (SLRs) in the decision rule. Other approaches have involved various statistical models for noise and noisy speech [5], and discriminative weight training (DWT) scheme [6].…”
Section: Introductionmentioning
confidence: 99%