Phonemic HMM constrained by statistical VQ-code transition

Takahashi, Satoshi; Matsuoka, Tatsuo; Shikano, Kiyohiro

doi:10.1109/icassp.1992.225848

Cited by 6 publications

(6 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The reasons for this phenomenon that the incorporation of frame correlations caused even more errors than the baseline system can be explained in several ways. First, as is known from [12], the characteristics of frame correlations can be considered highly speaker-dependent. However, in our experiments, all the test speakers were different from the training speakers, and the frame correlations were used in speaker-independent mode.…”

Section: Experimental Results For Phoneme-independent Frame Correlmentioning

confidence: 99%

“…But, it has been known that the correlation PD's obtained in such a way tend to concentrate on the correlation characteristics of the frequently observed phonemes. To compensate for this, in [12], the correlation PD's are obtained by pooling the cooccurrence counts with equal contribution of each phoneme, and they are shown to be better empirically than those obtained by simple pooling. Assume that there are phonemes, used as units for recognition.…”

Section: Frame-correlation Pdmentioning

confidence: 99%

“…Usually, the probability of an output symbol which is unlikely to be observed in a state, is lowered in If, however, this output symbol is important for distinguishing a word from others, the use of may incurr more errors for the word than the use of Furthermore, if the observation of previous frame is not reliable due to noise or other factors, the a posteriori PD's may bring about more fatal effects than those of the a priori. In [12], to alleviate this problem, a threshold type of binary checking as to whether to use the a posteriori PD's was tried.…”

Section: Priori-posteriori Combination Of Pd'smentioning

confidence: 99%

“…Even though this full parametization is the most natural way to express the behavior of temporal correlations, the number of parameters to be estimated may increase too excessively to get reliable estimates for the output PD's. As an alternative to this, a bigram-constrained (BC) HMM was proposed [12], [13]. In the BC HMM, the spectral shape of an output PD in a state is restricted according to the observation symbol on the previous frame.…”

Section: Introductionmentioning

confidence: 99%

“…Another possibility is to use phonemeindependent frame correlation PD's, which can be obtained by merging frame correlation PD's over all phonemes. But, as shown in [12], even the phoneme-independent correlation PD's must take the contribution of each phoneme into consideration with much care. Next, we present a technique to combine two kinds of PD's through some exponents which are estimated according to the maximum mutual information (MMI) criterion [20].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Frame-correlated hidden Markov model based on extended logarithmic pool

Kim

Un²

1997

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

We present a novel method to incorporate temporal correlations into a speech recognition system based on conventional hidden Markov models (HMM's). The temporal correlations are considered to be useful for recognition because of the fact that the speech features of the present frame are highly informative about the feature characteristics of neighboring frames. In this paper, by treating these correlations in the form of conditional probability distributions (PD's), we propose a new technique for incorporating frame correlations. With the proposed method called the extended logarithmic pool (ELP), we approximate a joint conditional PD by separate conditional PD's associated with respective conditions. We provide a constrained optimization algorithm with which we can find the optimal value for the pooling weights. For practical purposes, we also suggest methods to get robust PD estimates for characterizing frame correlation. In addition, to improve model discriminability, a technique to combine two kinds of PD's through the exponents is introduced. The results in the experiments of speakerindependent continuous speech recognition with the proposed approaches show error reduction up to 20.5% as compared to that with the conventional bigram-constrained (BC) HMM method. where he conducted research on voice digitization and bandwidth compression systems. He is currently employed as a Professor of Electrical Engineering at the Korea Advanced Institute of Science and Technology (KAIST), where he teaches and conducts research in the areas of digital communications and signal processing. To date, he has supervised 55 Ph.D. and more than 100 M.S. graduates. He has authored and coauthored over 300 papers on speech coding and processing, adaptive signal processing, data communications, B-ISDN, protocol design and analysis, and very-high-speech packet communication systems.

show abstract

Section: Experimental Results For Phoneme-independent Frame Correlmentioning

confidence: 99%

Section: Frame-correlation Pdmentioning

confidence: 99%

Section: Priori-posteriori Combination Of Pd'smentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Frame-correlated hidden Markov model based on extended logarithmic pool

Kim

Un²

1997

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

show abstract

Speech recognition using phoneme HMM constrained by frame correlation

Takahashi

Matsuoka

Minami

et al. 1994

Electron Comm Jpn Pt III

View full text Add to dashboard Cite

One of the problems with the hidden Markov model (HMM) in performing speech recognition is that the local transition information of the feature vectors is not incorporated into the mechanism of the model and the model is not constrained by transitions of the feature vectors. Thus, the output probability distribution never changes during recognition. Furthermore, all transitions between the vectors that have high probabilities are allowed even if those transitions did not appear in the training data. This paper proposes a bigram‐constrained HMM that uses correlations between two frames to constrain the feature distributions of a speaker‐independent HMM to the region most appropriate for the speaker. Since the output probability of the bigram‐constrained HMM is a conditional probability restricted by the feature vector of the previous frame, the output probability changes dynamically at each frame depending on the feature vector of the previous frame. Constraining the feature distribution makes it possible to reduce the overlapping of feature distributions between different phonemes which improves recognition performance. Previously, we proposed the discrete bigram‐constrained HMM which is based on the combination of a discrete speaker‐independent HMM and the VQ‐code bigram. We showed that it performed better than conventional speaker‐independent HMMs. In this paper, the strategy is extended to the tied‐mixture bigram‐constrained HMM and the continuous bigram‐constrained HMM to obtain better recognition performance. These three types of HMMs are formulated and evaluated by phoneme recognition in continuous speech.

show abstract