2003
DOI: 10.1016/s0167-6393(02)00066-3
|View full text |Cite
|
Sign up to set email alerts
|

Towards improving speech detection robustness for speech recognition in adverse conditions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
37
0
2

Year Published

2005
2005
2016
2016

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 60 publications
(40 citation statements)
references
References 9 publications
1
37
0
2
Order By: Relevance
“…Many authors claim that VADs are well compared by evaluating speech recognition performance (Woo et al, 2000) since nonefficient speech/non-speech classification is an important source of the degradation of recognition performance in noisy environments (Karray & Martin, 2003). There are two clear motivations for that: i) noise parameters such as its spectrum are updated during nonspeech periods being the speech enhancement system strongly influenced by the quality of the noise estimation, and ii) frame-dropping (FD), a frequently used technique in speech recognition to reduce the number of insertion errors caused by the noise, is based on the VAD decision and speech misclassification errors lead to loss of speech, thus causing irrecoverable deletion errors.…”
Section: Speech Recognition Experimentsmentioning
confidence: 99%
See 1 more Smart Citation
“…Many authors claim that VADs are well compared by evaluating speech recognition performance (Woo et al, 2000) since nonefficient speech/non-speech classification is an important source of the degradation of recognition performance in noisy environments (Karray & Martin, 2003). There are two clear motivations for that: i) noise parameters such as its spectrum are updated during nonspeech periods being the speech enhancement system strongly influenced by the quality of the noise estimation, and ii) frame-dropping (FD), a frequently used technique in speech recognition to reduce the number of insertion errors caused by the noise, is based on the VAD decision and speech misclassification errors lead to loss of speech, thus causing irrecoverable deletion errors.…”
Section: Speech Recognition Experimentsmentioning
confidence: 99%
“…These systems often benefit from using voice activity detectors (VADs) which are frequently used in such application scenarios for different purposes. Speech/non-speech detection is an unsolved problem in speech processing and affects numerous applications including robust speech recognition (Karray & Martin, 2003), (Ramírez et al, 2003), discontinuous transmission (ETSI, 1999), (ITU, 1996), estimation and detection of speech signals (Krasny, 2000), real-time speech transmission on the Internet (Sangwan et al, 2002) or combined noise reduction and echo cancellation schemes in the context of telephony (Basbug et al, 2003). The speech/non-speech classification task is not as trivial as it appears, and most of the VAD algorithms fail when the level of background noise increases.…”
Section: Introductionmentioning
confidence: 99%
“…A fuzzy set F defined on a discourse universe U is characterized by a membership function μ F (x) which takes values in the interval [0,1]. A fuzzy set is a generalization of a crisp set.…”
Section: Fuzzy Logicmentioning
confidence: 99%
“…Detecting the presence of speech in a noisy signal is a problem affecting numerous applications including robust speech recognition [1,2], discontinuous transmission (DTX) in voice communications [3,4] or real-time speech transmission on the Internet [5]. The classification task is not as trivial as it appears, and most of the VAD algorithms often fail in high noise conditions.…”
Section: Introductionmentioning
confidence: 99%
“…But the challenge to VAD is to detect speech under low signal-to-noise ratio (SNR) scenarios and also under the influence of nonstationary noises which cause significant errors. The main applications related to VAD are speech coding [1] and speech recognition [2]. VAD stands as a preprocessing stage for major speech processing applications.…”
Section: Introductionmentioning
confidence: 99%