2010 IEEE International Conference on Acoustics, Speech and Signal Processing 2010
DOI: 10.1109/icassp.2010.5495599
|View full text |Cite
|
Sign up to set email alerts
|

Integration of sporadic noise model in POMDP-based voice activity detection

Abstract: Partially observable Markov decision process (POMDP) has recently been applied to a voice activity detector (VAD), which makes it possible to incorporate knowledge about the recording environments in the decision process in order to achieve a more stable performance in noisy situations. In this paper, the model has been further explored to utilize prior knowledge about possible intermittent noise signals such as breath or click sounds, in addition to the stationary background noise types. The experimental resu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2012
2012
2015
2015

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 4 publications
0
2
0
Order By: Relevance
“…However, most research activities have focused on improving the frame-level decision performance by developing robust features [1]- [2], feature combinations [3], and modeling approaches [4]- [9], while little attention has been paid to utterance-level decision, integrating both decision processes, or improving both processes at the same time. The first process is frame-level speech/non-speech classification based on statistical hypothesis testing, and the second process involves utterance-level speech boundary decision based on heuristic knowledge.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, most research activities have focused on improving the frame-level decision performance by developing robust features [1]- [2], feature combinations [3], and modeling approaches [4]- [9], while little attention has been paid to utterance-level decision, integrating both decision processes, or improving both processes at the same time. The first process is frame-level speech/non-speech classification based on statistical hypothesis testing, and the second process involves utterance-level speech boundary decision based on heuristic knowledge.…”
Section: Introductionmentioning
confidence: 99%
“…The overall performance of endpoint detection is determined through these two processes. However, most research activities have focused on improving the frame-level decision performance by developing robust features [1]- [2], feature combinations [3], and modeling approaches [4]- [9], while little attention has been paid to utterance-level decision, integrating both decision processes, or improving both processes at the same time. This is because the statistical approach provides a way to optimize frame-level decision parameters systematically, whereas a heuristic-knowledgebased approach makes it difficult to define and optimize utterance-level decision parameters.…”
Section: Introductionmentioning
confidence: 99%