Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic

Chung, Hoon; Lee, Sung Joo; Lee, Yun Keun

doi:10.4218/etrij.14.2214.0030

Cited by 3 publications

(1 citation statement)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Chung et al proposed an EPD algorithm that classifies speech and non-speech states using the SAD technique based on a log-likelihood ratio (LLR) test proposed in [9], and then finds the endpoint with the online decoder designed based on a weighted finite-state transducer (wFST) [10]. Since it is difficult to optimize the LLR test-based SAD and wFST jointly, this EPD scheme was further improved by adopting the quantized LLR states as the wFST input instead of the binary speech/non-speech state [11]. The performance of these EPD structures is dramatically enhanced with the help of the SAD algorithms based on deep neural networks (DNN), which yield the state-of-the-art SAD performance via deep nonlinear hidden layers [12]- [17].…”

Section: Introductionmentioning

confidence: 99%

End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition

Hwang

Chang

2020

IEEE Access

View full text Add to dashboard Cite

learning multi-speaker prosody and emotion cloning technology based on a high quality end-to-end model using small amount of data) ABSTRACT Speech endpoint detection (EPD) benefits from the decoder state features (DSFs) of online automatic speech recognition (ASR) system. However, the DSFs are obtained via the ASR decoding process, which can become prohibitively expensive especially in limited-resource scenarios such as the embedded devices. To address this problem, this paper proposes a language model (LM)-based end-of-utterance (EOU) predictor, which is trained to determine the framewise probabilities of the EOU token conditioned on the previous word history obtained from the 1-best decoding hypothesis of the ASR system in an end-to-end manner without an actual decoding process in the test step. Further, a novel end-to-end EPD strategy is presented to incorporate a phonetic embedding (PE)-based acoustic modeling knowledge and the proposed EOU predictor-based language modeling knowledge into an acoustic feature embedding (AFE)-based EPD approach within the recurrent neural networks (RNN)-based EPD framework. The proposed EPD algorithm is built upon the ensemble RNNs, which are independently trained for the three parts, which are the proposed LM-based EOU predictor, AFE-based EPD, and PE-based acoustic model (AM) in accordance with each target. The ensemble RNNs are concatenated at the level of the last hidden layers and then attached into the fully-connected deep neural networks (DNN)-based EPD classifier, which is trained in accordance with the ultimate EPD target. Thereafter, they are jointly retrained at the second step of the DNN training to yield the lower endpoint error. The proposed EPD framework was evaluated in terms of the endpoint accuracy and word error rate for the CHiME-3 and large-scale ASR tasks. The experimental results turn out that the proposed EPD algorithm efficiently outperforms the conventional EPD approaches. INDEX TERMS acoustic model (AM), end-of-turn detection, end-of-utterance (EOU) detection, feature embedding, language model (LM), online speech recognition, pause hesitation, speech endpoint detection (EPD), spoken dialogue system

show abstract

Section: Introductionmentioning

confidence: 99%

End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition

Hwang

Chang

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

GenieTutor: A Computer-Assisted Second-Language Learning System Based on Spoken Language Understanding

Kwon

Lee

Roh

et al. 2015

Natural Language Dialog Systems and Intelligent Assistants

View full text Add to dashboard Cite

GenieTutor: a computer assisted second-language learning system based on semantic and grammar correctness evaluations

Kwon

Lee

Kim

et al. 2015

Critical CALL – Proceedings of the 2015 EUROCALL Conference, Padova, Italy

View full text Add to dashboard Cite

This paper introduces a Dialog-Based Computer-Assisted second-Language Learning (DB-CALL) system using semantic and grammar correctness evaluations and the results of its experiment. While the system dialogues with English learners about a given topic, it automatically evaluates the grammar and content properness of their English utterances, then gives corrective feedback on grammar and semantics. The system consists of a non-native optimized speech recognition module and a semantic/grammar correctness evaluation based tutoring module. The tutoring module decides to continue the dialogue or asks learners to try again by evaluating semantic correctness of their utterances, and also gives them turn-by-turn semantic and grammatical corrective feedback. The semantic correctness evaluation consists of a 2-classes classifier for the 'pass or try again' and a 6-classes classifier for semantic corrective feedback, using the domain knowledge and language model. The grammatical correctness is evaluated by a hybrid grammatical error correction system composed of four approaches: a rule-based, a machine learning-based, an n-gram based, and an edit distance based approach. In the experiments, in which 30 subjects in a real environment took part, we acknowledged that the 'pass or try again' evaluation has a success rate of 97.5%, the semantic feedback classification has a success rate of 87.8%, and the precision and recall for grammar error correction are 79.2% and 60.9%, respectively. ISBN13: 978-1-908416-28-5 (Paperback -Print on demand, black and white) Print on demand technology is a high-quality, innovative and ecological printing method; with which the book is never 'out of stock' or 'out of print'. ISBN13: 978-1-908416-29-2 (Ebook, PDF, colour) ISBN13: 978-1-908416-30-8 (Ebook, EPUB, colour) Legal deposit, Ireland:

show abstract

Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic

Cited by 3 publications

References 10 publications

End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition

End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition

GenieTutor: A Computer-Assisted Second-Language Learning System Based on Spoken Language Understanding

GenieTutor: a computer assisted second-language learning system based on semantic and grammar correctness evaluations

Contact Info

Product

Resources

About