Derivative kernels for noise robust ASR

Ragni, Anton; Gales, Mjf

doi:10.1109/asru.2011.6163916

Cited by 21 publications

(27 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To summarize our experimental results, by empirically tuning the phase factor, we achieved 16.8% WER for JUD/SGMM system on the Aurora 4 corpus, which is comparable to the state-of-the-art noise compensation results on this task [42], [46]. Further improvements have been observed by VTS-based noise adaptive training [39], joint speaker/noise compensation [42], and discriminative adaptive training [46].…”

Section: Discussionsupporting

confidence: 61%

Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models

Chin

Ghoshal

et al. 2013

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-Joint uncertainty decoding (JUD) is a model-based noise compensation technique for conventional Gaussian Mixture Model (GMM) based speech recognition systems. Unlike vector Taylor series (VTS) compensation which operates on the individual Gaussian components in an acoustic model, JUD clusters the Gaussian components into a smaller number of classes, sharing the compensation parameters for the set of Gaussians in a given class. This significantly reduces the computational cost. In this paper, we investigate noise compensation for subspace Gaussian mixture model (SGMM) based speech recognition systems using JUD. The total number of Gaussian components in an SGMM is typically very large. Therefore direct compensation of the individual Gaussian components, as performed by VTS, is computationally expensive. In this paper we show that JUDbased noise compensation can be successfully applied to SGMMs in a computationally efficient way. We evaluate the JUD/SGMM technique on the standard Aurora 4 corpus. Our experimental results indicate that the JUD/SGMM system results in lower word error rates compared with a conventional GMM system with either VTS-based or JUD-based noise compensation.

show abstract

Section: Discussionsupporting

confidence: 61%

Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models

Chin

Ghoshal

et al. 2013

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Joint learning {λ, α} in the large margin framework will be investigated in the future. Future work will also involve the kernelization of the proposed structured SVM to support high dimensional feature spaces such as the derivative feature space [28].…”

Section: Discussionmentioning

confidence: 99%

“…This feature space concatenates the log-likelihoods from all models, including the correct model and competing ones, to yield additional information from the observations. More general feature-spaces, such as derivative ones [28], can relax the conditional independence assumption. Using the above joint feature-spaces the dot-product of the φ(O, w; θ) and structured SVM parameter α can be evaluated by accumulating every segment score [14] …”

Section: A Joint Feature Spacementioning

confidence: 99%

See 1 more Smart Citation

Structured SVMs for Automatic Speech Recognition

Zhang

Gales

2013

IEEE Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Abstract-Combining generative and discriminative models offers a flexible sequence classification framework. This paper describes a structured support vector machines (SSVM) approach in this framework suitable for medium to large vocabulary speech recognition. One important aspect of SSVMs is the form of the joint feature space. In this work features based on context-dependent generative models are used. These features require a segmentation to be specified, a Viterbi-like scheme for obtaining the "optimal" segmentation is described. Large margin log linear models with a zero mean Gaussian prior of discriminative parameters is shown to be an example of this model. However, depending on the nature of the feature space, a non-zero prior may be more appropriate. An extended SSVM training algorithm is proposed to allow a general Gaussian prior to be incorporated into the large margin criterion. To speed up the training process, a 1-slack algorithm, caching competing hypotheses and parallelization strategies are also described. The performance of SSVMs is evaluated on small and medium to large speech recognition tasks: AURORA 2 and 4.

show abstract

“…Features for largevocabulary systems will be extracted per phone, like in [20], so that the segmentation is likely to have greater impact on performance.…”

Section: Methodsmentioning

confidence: 99%

Efficient decoding with generative score-spaces using the expectation semiring

Dalen

Ragni

Gales

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

State-of-the-art speech recognisers are usually based on hidden Markov models (HMMs). They model a hidden symbol sequence with a Markov process, with the observations independent given that sequence. These assumptions yield efficient algorithms, but limit the power of the model. An alternative model that allows a wide range of features, including word-and phone-level features, is a log-linear model. To handle, for example, word-level variable-length features, the original feature vectors must be segmented into words. Thus, decoding must find the optimal combination of segmentation of the utterance into words and word sequence. Features must therefore be extracted for each possible segment of audio. For many types of features, this becomes slow. In this paper, long-span features are derived from the likelihoods of word HMMs. Derivatives of the log-likelihoods, which break the Markov assumption, are appended. Previously, decoding with this model took cubic time in the length of the sequence, and longer for higher-order derivatives. This paper shows how to decode in quadratic time.

show abstract

Derivative kernels for noise robust ASR

Cited by 21 publications

References 14 publications

Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models

Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models

Structured SVMs for Automatic Speech Recognition

Efficient decoding with generative score-spaces using the expectation semiring

Contact Info

Product

Resources

About