2014
DOI: 10.4218/etrij.14.2214.0030
|View full text |Cite
|
Sign up to set email alerts
|

Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic

Abstract: In this paper, we propose the use of data‐driven probabilistic utterance‐level decision logic to improve Weighted Finite State Transducer (WFST)‐based endpoint detection. In general, endpoint detection is dealt with using two cascaded decision processes. The first process is frame‐level speech/non‐speech classification based on statistical hypothesis testing, and the second process is a heuristic‐knowledge‐based utterance‐level speech boundary decision. To handle these two processes within a unified framework,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 10 publications
0
1
0
Order By: Relevance
“…Chung et al proposed an EPD algorithm that classifies speech and non-speech states using the SAD technique based on a log-likelihood ratio (LLR) test proposed in [9], and then finds the endpoint with the online decoder designed based on a weighted finite-state transducer (wFST) [10]. Since it is difficult to optimize the LLR test-based SAD and wFST jointly, this EPD scheme was further improved by adopting the quantized LLR states as the wFST input instead of the binary speech/non-speech state [11]. The performance of these EPD structures is dramatically enhanced with the help of the SAD algorithms based on deep neural networks (DNN), which yield the state-of-the-art SAD performance via deep nonlinear hidden layers [12]- [17].…”
Section: Introductionmentioning
confidence: 99%
“…Chung et al proposed an EPD algorithm that classifies speech and non-speech states using the SAD technique based on a log-likelihood ratio (LLR) test proposed in [9], and then finds the endpoint with the online decoder designed based on a weighted finite-state transducer (wFST) [10]. Since it is difficult to optimize the LLR test-based SAD and wFST jointly, this EPD scheme was further improved by adopting the quantized LLR states as the wFST input instead of the binary speech/non-speech state [11]. The performance of these EPD structures is dramatically enhanced with the help of the SAD algorithms based on deep neural networks (DNN), which yield the state-of-the-art SAD performance via deep nonlinear hidden layers [12]- [17].…”
Section: Introductionmentioning
confidence: 99%