2010 IEEE International Conference on Acoustics, Speech and Signal Processing 2010
DOI: 10.1109/icassp.2010.5494980
|View full text |Cite
|
Sign up to set email alerts
|

Spoken term detection with Connectionist Temporal Classification: A novel hybrid CTC-DBN decoder

Abstract: This paper proposes a novel system for robust keyword detection in continuous speech. Our decoder is composed of a bidirectional Long Short-Term Memory recurrent neural network using a Connectionist Temporal Classification (CTC) output layer, and a Dynamic Bayesian Network (DBN). The CTC network exploits bidirectional context information to reliably identify phonemes, whereas the DBN is able to discriminate between keywords and arbitrary speech while explicitly modeling substitutions, deletions, and insertions… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 10 publications
0
7
0
Order By: Relevance
“…As a second baseline model, we evaluated the keyword spotting performance of a hybrid BLSTM-HMM system, since this approach was shown to prevail over the standard HMM approach [61]. Unlike the proposed Tandem model, the hybrid approach exclusively uses BLSTM phoneme predictions for keyword detection.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…As a second baseline model, we evaluated the keyword spotting performance of a hybrid BLSTM-HMM system, since this approach was shown to prevail over the standard HMM approach [61]. Unlike the proposed Tandem model, the hybrid approach exclusively uses BLSTM phoneme predictions for keyword detection.…”
Section: Methodsmentioning
confidence: 99%
“…Typical phoneme prediction errors made by the CTC network are modeled by the HMM layer of the hybrid system (similar to the trained CPFs p(b t |s t ) for the Tandem model). For further details on the hybrid approach, the reader is referred to [61].…”
Section: Methodsmentioning
confidence: 99%
“…In [22], a keyword spotter for specific keywords using BLSTM-CTC is first proposed but only the most intuitive postprocessing is applied. Besides, [23] comes up with a complicated CTC-DBN decoder which is shown to outperform a Keyword-Filler Hidden Markov Model system. In this paper, the LSTM is unidirectional so the number of parameters is reasonable for many applications.…”
Section: Introductionmentioning
confidence: 99%
“…Future experiments will include the design of bottle-neck [5] BLSTM networks as well as the combination of multi-stream BLSTM-HMM systems with techniques for feature enhancement in order to allow further performance gains in noisy conditions. A further promising aspect for future research is to use the principle of connectionist temporal classification [6] for continuous speech recognition.…”
Section: Discussionmentioning
confidence: 99%
“…Most studies concentrate on improving the front-or back-end of ASR systems based on Hidden Markov Models (HMM), however, strategies towards improving ASR in challenging conditions by combining the HMM principle with multilayer perceptrons (MLP) or recurrent neural networks (RNN) are gaining more and more attention [4,5,6]. These techniques can be roughly categorized into hybrid approaches that apply neural networks to generate state posteriors for HMMs, and Tandem approaches that use the network output as features instead of (or in combination with) standard cepstral features.…”
Section: Introductionmentioning
confidence: 99%