Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-3176
|View full text |Cite
|
Sign up to set email alerts
|

Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

Abstract: Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech Recognition (ASR) systems, such as the ASR system in Alexa, as they are used to generate pronunciations for out-of-vocabulary words that do not exist in the pronunciation lexicons (mappings like "e c h o" → "E k oU ").Most G2P systems are monolingual and based on traditional joint-sequence based n-gram models [1,2]. As an alternative, we present a single end-to-end trained neural G2P model that shares same encoder and decoder across multi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…Tan et al [14] proposed to train separate models for each language at first and then perform knowledge distillation from each language-specific model to the multilingual model for multilingual translation. Wang et al [12] presented a Grapheme-to-Phoneme (G2P) model which share the same encoder and decoder across multiple languages by utilizing a combination of universal symbol inventories of Latin-like alphabets and cross-linguistically shared feature representations. Toshniwal et al [15] take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model on data combined by different languages for speech recognition.…”
Section: Multilingual Learningmentioning
confidence: 99%
“…Tan et al [14] proposed to train separate models for each language at first and then perform knowledge distillation from each language-specific model to the multilingual model for multilingual translation. Wang et al [12] presented a Grapheme-to-Phoneme (G2P) model which share the same encoder and decoder across multiple languages by utilizing a combination of universal symbol inventories of Latin-like alphabets and cross-linguistically shared feature representations. Toshniwal et al [15] take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model on data combined by different languages for speech recognition.…”
Section: Multilingual Learningmentioning
confidence: 99%
“…Furthermore, the efficacy of end-to-end automatic speech recognition (ASR) models utilizing character or word-based modeling units is limited in terms of effectively accommodating out-of-vocabulary (OOV), unless pronunciation dictionaries are employed. As a result, an increasing number of scholars are exploring the development of multilingual pronunciation dictionaries [1][2][3][4][5]. A multilingual pronunciation dictionary is constructed to solve the pronunciation problem in multilingual speech technology.…”
Section: Introductionmentioning
confidence: 99%
“…For instance, when a low-resource language shares certain phonemes with a high-resource language, an existing pronunciation dictionary for the latter can assist in constructing a pronunciation dictionary for the former. Primarily, this methodology has been proven to be advantageous for low-resource languages constrained by limited data [5,6].…”
Section: Introductionmentioning
confidence: 99%