Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS

Çetin, Özgür; Magimai-Doss, Mathew; Livescu, Karen; Kantor, Arthur; King, Simon; Bartels, Chris; Frankel, Joe

doi:10.1109/asru.2007.4430080

“…Tandem features, based on phone posterior probability estimates, were originally proposed to improve monolingual speech recognition [11], but they have also proven effective in the cross-lingual setting. In this approach, multi-layer perceptrons (MLPs) trained using source language acoustic data of source language, are used to generate the MLP phone posterior features for the target language [12], [13], [14], [15]. As tandem acoustic features are not directly dependent on the lexicon, this approach is simple to apply.…”

Section: Introductionmentioning

confidence: 99%

Regularized Subspace Gaussian Mixture Models for Speech Recognition

Lu

¹

,

Ghoshal

²

,

Renals

³

2011

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Abstract-We investigate cross-lingual acoustic modelling for low resource languages using the subspace Gaussian mixture model (SGMM). We assume the presence of acoustic models trained on multiple source languages, and use the global subspace parameters from those models for improved modelling in a target language with limited amounts of transcribed speech. Experiments on the GlobalPhone corpus using Spanish, Portuguese, and Swedish as source languages and German as target language (with 1 hour and 5 hours of transcribed audio) show that multilingually trained SGMM shared parameters result in lower word error rates (WERs) than using those from a single source language. We also show that regularizing the estimation of the SGMM state vectors by penalizing their 1-norm help to overcome numerical instabilities and lead to lower WER.

show abstract

“…In this approach, multi-layer perceptrons (MLPs) trained using source language acoustic data of source language, are used to generate the MLP phone posterior features for the target language [12], [13], [14], [15]. As tandem acoustic features are not directly dependent on the lexicon, this approach is simple to apply.…”

Section: Introductionmentioning

confidence: 99%

Regularized subspace Gaussian mixture models for cross-lingual speech recognition

Lu

¹

,

Ghoshal

²

,

Renals

³

2011

2011 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding

View full text Add to dashboard Cite

Abstract-We investigate cross-lingual acoustic modelling for low resource languages using the subspace Gaussian mixture model (SGMM). We assume the presence of acoustic models trained on multiple source languages, and use the global subspace parameters from those models for improved modelling in a target language with limited amounts of transcribed speech. Experiments on the GlobalPhone corpus using Spanish, Portuguese, and Swedish as source languages and German as target language (with 1 hour and 5 hours of transcribed audio) show that multilingually trained SGMM shared parameters result in lower word error rates (WERs) than using those from a single source language. We also show that regularizing the estimation of the SGMM state vectors by penalizing their 1-norm help to overcome numerical instabilities and lead to lower WER.

show abstract

“…One of the common approach to model AFs is to estimate these features using ANNs; transform them using tandem feature extraction technique; concatenate them with the acoustic feature; and model them with HMMs [14,15,18,19]. We can adopt a similar approach for SL processing where the features representing different channels of information are extracted, concatenated…”

Section: Standard Hmm Based Approachmentioning

confidence: 99%

HMM-based Approaches to Model Multichannel Information in Sign Language Inspired from Articulatory Features-based Speech Processing

Tornay

¹

,

Razavi²,

Camgöz

³

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Sign language conveys information through multiple channels, such as hand shape, hand movement, and mouthing. Modeling this multichannel information is a highly challenging problem. In this paper, we elucidate the link between spoken language and sign language in terms of production phenomenon and perception phenomenon. Through this link we show that hidden Markov model-based approaches developed to model "articulatory" features for spoken language processing can be exploited to model the multichannel information inherent in sign language for sign language processing.

show abstract

Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS

Cited by 23 publications

References 21 publications

Regularized Subspace Gaussian Mixture Models for Speech Recognition

Regularized Subspace Gaussian Mixture Models for Speech Recognition

Regularized subspace Gaussian mixture models for cross-lingual speech recognition

HMM-based Approaches to Model Multichannel Information in Sign Language Inspired from Articulatory Features-based Speech Processing

Contact Info

Product

Resources

About