2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461600
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Lingual Phoneme Mapping for Language Robust Contextual Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…In addition to on-the-fly rescoring, we are able to dynamically modify the joint pronounciation/language model graph by adding new paths at runtime. This allows us to insert dynamic pronunciations or language model classes [3,4]. We do not yet have a direct replacement for this additional functionality in LAS.…”
Section: On-the-fly Rescoringmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition to on-the-fly rescoring, we are able to dynamically modify the joint pronounciation/language model graph by adding new paths at runtime. This allows us to insert dynamic pronunciations or language model classes [3,4]. We do not yet have a direct replacement for this additional functionality in LAS.…”
Section: On-the-fly Rescoringmentioning
confidence: 99%
“…Conventional contextual systems rely on being able to inspect and modify individual components of modular systems in order to function. For example, a standalone language model can support dynamic population of classes [3], and a standalone pronunciation model allows dynamic injection of pronunciations [4]. One drawback of this is that information consumed within one piece of the modeling may be useful elsewhere; acoustic signals could inform a language model or text normalizer.…”
Section: Introductionmentioning
confidence: 99%
“…Cross-lingual phoneme mapping has been used in conventional systems for recognizing foreign words [15]. First, a phoneme mapping is learned by aligning the pronunciations between foreign and target languages using TTS-synthesized audio and a pronunciation learning algorithm [18].…”
Section: Phoneme Mappingmentioning
confidence: 99%
“…We train our model using only American English data and thus its wordpieces and phoneme set (no data from foreign languages). In inference, given a list of foreign words, we bias the recognition using an English phoneme-level biasing FST, which is built by first tokenizing the words into foreign phonemes and then mapping them to English phonemes using [15]. For example, given a navigation query "directions to Créteil" and the assumption that the French word "Créteil" is in our biasing list, "Créteil" is first tokenized to French phonemes as "k R e t E j", and then mapped to English phonemes "k r\ E t E j" for biasing 1 .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation