Unicode-based graphemic systems for limited resource languages

Gales, Mark J. F.; Knill, Kate; Ragni, Anton

doi:10.1109/icassp.2015.7178960

Cited by 45 publications

(39 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, graphemic lexicons can be easily expanded to include out-of-vocabulary (OOV) words, unlike phonetic lexicons. For languages with a close grapheme-to-phone mapping, graphemic HMM-based systems have been shown to perform similarly to phonetic systems [1,2,3]. However, for languages with irregular grapheme-to-phone mappings, such as English, graphemic HMM-based systems normally perform significantly worse than their phonetic counterparts [2].…”

Section: Introductionmentioning

confidence: 99%

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

Wang

Chen

Gales

et al. 2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

State-of-the-art English automatic speech recognition systems typically use phonetic rather than graphemic lexicons. Graphemic systems are known to perform less well for English as the mapping from the written form to the spoken form is complicated. However, in recent years the representational power of deep-learning based acoustic models has improved, raising interest in graphemic acoustic models for English, due to the simplicity of generating the lexicon. In this paper, phonetic and graphemic models are compared for an English Multi-Genre Broadcast transcription task. A range of acoustic models based on lattice-free MMI training are constructed using phonetic and graphemic lexicons. For this task, it is found that having a long-span temporal history reduces the difference in performance between the two forms of models. In addition, system combination is examined, using parameter smoothing and hypothesis combination. As the combination approaches become more complicated the difference between the phonetic and graphemic systems further decreases. Finally, for all configurations examined the combination of phonetic and graphemic systems yields consistent gains.

show abstract

Section: Introductionmentioning

confidence: 99%

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

Wang

Chen

Gales

et al. 2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the absence of a phonetic lexicon, alternatively grapheme subword units based on the writing system have been explored in the literature (Kanthak and Ney, 2002a;Killer et al, 2003;Dines and Magimai.-Doss, 2007;Ko and Mak, 2014;Gales et al, 2015). The main advantage of using graphemes as subword units is that they make development of lexicons easy.…”

mentioning

confidence: 99%

Towards weakly supervised acoustic subword unit discovery and lexicon development using hidden Markov models

Razavi

Rasipuram

Magimai.-Doss

2018

Speech Communication

View full text Add to dashboard Cite

State-of-the-art automatic speech recognition and text-to-speech systems are based on subword units, typically phonemes. This necessitates a lexicon that maps each word to a sequence of subword units. Development of a phonetic lexicon for a language requires linguistic knowledge as well as human effort, which may not be always readily available, particularly for under-resourced languages.In such scenarios, an alternative approach is to use a lexicon based on units such as, graphemes or subword units automatically derived from the acoustic data. This article focuses on automatic subword unit based lexicon development using methods that are employed for development of grapheme-based systems.Specifically, we present a novel hidden Markov model (HMM) based formalism for automatic derivation of subword units and pronunciation generation using only transcribed speech data. In this approach, the subword units are derived from the clustered context-dependent units in a grapheme based system using the maximum-likelihood criterion. The subword unit based pronunciations are then generated by learning either a deterministic or a probabilistic relationship between the graphemes and the acoustic subword units (ASWUs). In this article, we first establish the proposed framework on a well resourced language by comparing it against related approaches in the literature and investigating the transferability of the derived subword units to other domains. We then show the scalability of the proposed approach on real under-resourced scenarios by conducting studies on Scottish Gaelic, a genuinely under-resourced language, * Corresponding author Email addresses: marzieh.razavi@idiap.ch (Marzieh Razavi), ramya.murali@gmail.com (Ramya Rasipuram), mathew@idiap.ch (Mathew Magimai.-Doss) Preprint submitted to ElsevierMarch 17, 2017and comparing the approach against state-of-the-art grapheme-based ASR approaches. Our experimental studies on English show that the derived subword units can not only lead to better ASR systems compared to graphemes, but can also be transferred across domains. The experimental studies on Scottish Gaelic show that the proposed ASWU-based lexicon development approach scales without any language specific considerations and leads to better ASR systems compared to a grapheme-based lexicon, including the case where ASR system performance is boosted through the use of acoustic models built with multilingual resources from resource-rich languages.

show abstract

“…This dataset contains a fairly limited quantity of training data, and should therefore benefit much from system combination. A graphemic lexicon [26] is used. The standard 10 hour development set is used for decoding with a trigram language model trained on the VLLP manual transcriptions.…”

Section: Methodsmentioning

confidence: 99%

Sequence Student-Teacher Training of Deep Neural Networks

Wong¹,

Gales²

2016

Interspeech 2016

View full text Add to dashboard Cite

The performance of automatic speech recognition can often be significantly improved by combining multiple systems together. Though beneficial, ensemble methods can be computationally expensive, often requiring multiple decoding runs. An alternative approach, appropriate for deep learning schemes, is to adopt student-teacher training. Here, a student model is trained to reproduce the outputs of a teacher model, or ensemble of teachers. The standard approach is to train the student model on the frame posterior outputs of the teacher. This paper examines the interaction between student-teacher training schemes and sequence training criteria, which have been shown to yield significant performance gains over frame-level criteria. There are several possible options for integrating sequence training, including training of the ensemble and further training of the student. This paper also proposes an extension to the studentteacher framework, where the student is trained to emulate the hypothesis posterior distribution of the teacher, or ensemble of teachers. This sequence student-teacher training approach allows the benefit of student-teacher training to be directly combined with sequence training schemes. These approaches are evaluated on two speech recognition tasks: a Wall Street Journal based task and a low-resource Tok Pisin conversational telephone speech task from the IARPA Babel programme.

show abstract

Unicode-based graphemic systems for limited resource languages

Cited by 45 publications

References 10 publications

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

Towards weakly supervised acoustic subword unit discovery and lexicon development using hidden Markov models

Sequence Student-Teacher Training of Deep Neural Networks

Contact Info

Product

Resources

About