2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178960
|View full text |Cite
|
Sign up to set email alerts
|

Unicode-based graphemic systems for limited resource languages

Abstract: Large vocabulary continuous speech recognition systems require a mapping from words, or tokens, into sub-word units to enable robust estimation of acoustic model parameters, and to model words not seen in the training data. The standard approach to achieve this is to manually generate a lexicon where words are mapped into phones, often with attributes associated with each of these phones. Contextdependent acoustic models are then constructed using decision trees where questions are asked based on the phones an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
38
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 45 publications
(39 citation statements)
references
References 10 publications
1
38
0
Order By: Relevance
“…Moreover, graphemic lexicons can be easily expanded to include out-of-vocabulary (OOV) words, unlike phonetic lexicons. For languages with a close grapheme-to-phone mapping, graphemic HMM-based systems have been shown to perform similarly to phonetic systems [1,2,3]. However, for languages with irregular grapheme-to-phone mappings, such as English, graphemic HMM-based systems normally perform significantly worse than their phonetic counterparts [2].…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, graphemic lexicons can be easily expanded to include out-of-vocabulary (OOV) words, unlike phonetic lexicons. For languages with a close grapheme-to-phone mapping, graphemic HMM-based systems have been shown to perform similarly to phonetic systems [1,2,3]. However, for languages with irregular grapheme-to-phone mappings, such as English, graphemic HMM-based systems normally perform significantly worse than their phonetic counterparts [2].…”
Section: Introductionmentioning
confidence: 99%
“…In the absence of a phonetic lexicon, alternatively grapheme subword units based on the writing system have been explored in the literature (Kanthak and Ney, 2002a;Killer et al, 2003;Dines and Magimai.-Doss, 2007;Ko and Mak, 2014;Gales et al, 2015). The main advantage of using graphemes as subword units is that they make development of lexicons easy.…”
mentioning
confidence: 99%
“…This dataset contains a fairly limited quantity of training data, and should therefore benefit much from system combination. A graphemic lexicon [26] is used. The standard 10 hour development set is used for decoding with a trigram language model trained on the VLLP manual transcriptions.…”
Section: Methodsmentioning
confidence: 99%