2013
DOI: 10.1186/1687-4722-2013-21
|View full text |Cite|
|
Sign up to set email alerts
|

Efficient voice activity detection algorithm using long-term spectral flatness measure

Abstract: This paper proposes a novel and robust voice activity detection (VAD) algorithm utilizing long-term spectral flatness measure (LSFM) which is capable of working at 10 dB and lower signal-to-noise ratios(SNRs). This new LSFM-based VAD improves speech detection robustness in various noisy environments by employing a low-variance spectrum estimate and an adaptive threshold. The discriminative power of the new LSFM feature is shown by conducting an analysis of the speech/non-speech LSFM distributions. The proposed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 53 publications
(34 citation statements)
references
References 19 publications
0
34
0
Order By: Relevance
“…In subjective evaluation, a human listener evaluates for VAD errors, whereas, numerical computations are carried out for objective evaluation. However, subjective evaluation alone is insufficient to examine the VAD performance, because listening tests like ABC fail to consider the effects of false alarm [32,34,35]. Hence numerical computations through objective evaluation help in reporting the performance of the proposed VAD algorithm.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…In subjective evaluation, a human listener evaluates for VAD errors, whereas, numerical computations are carried out for objective evaluation. However, subjective evaluation alone is insufficient to examine the VAD performance, because listening tests like ABC fail to consider the effects of false alarm [32,34,35]. Hence numerical computations through objective evaluation help in reporting the performance of the proposed VAD algorithm.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…Dominant frequency component D x (m) of each frame is calculated using steps described in section II-A and spectral envelopes are also estimated. Then we follow the same procedure for LSFM feature L x (m) computation as stated in Yanna Ma et al [15]. The power spectrum of the segmented signal is estimated using Welch-Bartlett method since it is better than periodogram [22].…”
Section: The Proposed Algorithmmentioning
confidence: 99%
“…Proper selection of this two parameter will increase discriminating power of speech and non-speech and hence a better VAD. We evaluate our proposed algorithm following the method mentioned in [15] using the Edinburgh corpus database and NOISEX92 database (in section IV). Experimentally we also found that the values for R = 30 and M = 10 are same as in [15].…”
Section: B Selection Of R and Mmentioning
confidence: 99%
See 1 more Smart Citation
“…A voice activity detector basically consists of two main processes, namely feature extraction and classification. Some of the popular features in speech processing are zero crossing rate [4], energy [5], signal-to-noise ratio, spectral flatness [6], correlation [7], etc. Instead of modeling the dynamic noise features using support vector machine (SVM) trained on noise-labeled training data [8], some recent VADs focus on the extraction of robust speech features such as the formant frequencies of eight English vowels [9].…”
Section: Discriminative Features and Classificationmentioning
confidence: 99%