2012
DOI: 10.1016/j.compeleceng.2012.09.003
|View full text |Cite
|
Sign up to set email alerts
|

Voice activity detection algorithm using nonlinear spectral weights, hangover and hangbefore criteria

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…For example, in [50], a decision-tree algorithm that combines the scores of HMM-based speech/non-speech models and speech pulse information was used for rejecting far-field speech in speech recognition systems. Both [21,52] and [50] use statistical models to characterize speech and non-speech signals, with some decision logics governing the switching between speech and non-speech states. The difference being that in the GMM-VAD of [21], state duration is governed by the number of speech frames (as detected by the GMMs) in a fixed-length buffer, and that in the GMM-VAD of [52] state duration is governed by a hangover and handbefore scheme which detects the consonants occurred at the beginning, middle and the end of words; whereas in the HMM-VAD of [50], the state duration is controlled by the state-transition probabilities of the HMMs and speech pulse information.…”
Section: Introductionmentioning
confidence: 99%
“…For example, in [50], a decision-tree algorithm that combines the scores of HMM-based speech/non-speech models and speech pulse information was used for rejecting far-field speech in speech recognition systems. Both [21,52] and [50] use statistical models to characterize speech and non-speech signals, with some decision logics governing the switching between speech and non-speech states. The difference being that in the GMM-VAD of [21], state duration is governed by the number of speech frames (as detected by the GMMs) in a fixed-length buffer, and that in the GMM-VAD of [52] state duration is governed by a hangover and handbefore scheme which detects the consonants occurred at the beginning, middle and the end of words; whereas in the HMM-VAD of [50], the state duration is controlled by the state-transition probabilities of the HMMs and speech pulse information.…”
Section: Introductionmentioning
confidence: 99%
“…The closest one is the VFR VAD [3], which is our previous work that also uses a posteriori SNR weighted energy distance as the feature for VAD decision. The GMM-NLSM [27] VAD provides good performance as well, but still with a 3% (absolute) higher FER as compared with rVAD, and furthermore it should be noted that GMM-NLSM is a supervised VAD where the GMMs are trained using multicondition training data of the Aurora 2 database. The next one in line is the VAD method in the DSR AFE frontend [25], which is an unsupervised VAD and gives a more than 5% (absolute) higher FER than that of rVAD.…”
Section: Comparison With Referenced Methods and Evaluation Of Differementioning
confidence: 95%
“…The comparison in this table is conducted in terms of frame error rate (FER) since results of LTSV and GMM-NSLM are only available in terms of FER. Note that the identical experimental settings and labels are used across [3,17,27] and the present work, so the comparison is valid.…”
Section: Comparison With Referenced Methods and Evaluation Of Differementioning
confidence: 99%
See 2 more Smart Citations