2021
DOI: 10.1186/s13636-020-00193-1
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic out-of-vocabulary word registration to language model for speech recognition

Abstract: We propose a method of dynamically registering out-of-vocabulary (OOV) words by assigning the pronunciations of these words to pre-inserted OOV tokens, editing the pronunciations of the tokens. To do this, we add OOV tokens to an additional, partial copy of our corpus, either randomly or to part-of-speech (POS) tags in the selected utterances, when training the language model (LM) for speech recognition. This results in an LM containing OOV tokens, to which we can assign pronunciations. We also investigate the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 16 publications
0
1
0
Order By: Relevance
“…This, in turn, results in the hypothesis always containing words in the lexicon leading to errors in recognition. There have been efforts to detect OOVs using filler models [41,42,43] where there are placeholders for OOVs which can then be replaced with the extended vocabulary for improved recognition. Confidence measures are another indication of OOV presence.…”
Section: Oov Detection and Recoverymentioning
confidence: 99%
“…This, in turn, results in the hypothesis always containing words in the lexicon leading to errors in recognition. There have been efforts to detect OOVs using filler models [41,42,43] where there are placeholders for OOVs which can then be replaced with the extended vocabulary for improved recognition. Confidence measures are another indication of OOV presence.…”
Section: Oov Detection and Recoverymentioning
confidence: 99%
“…The ratio of the number of words in the corpus to the number of vocabulary is important, and in the case of "Decision to Leave", the ratio is about 4.12. To explain the ratio, we can use the analogy of Lego blocks: the fewer the types of blocks used (vocabulary) and the larger the structure created by stacking them (corpus), the easier it is to learn the order of application of the blocks [27]. Naturally, if a language model is created using only one movie script, the sentence generation performance is low, but it is reasonable for the purpose of obtaining word vectors and utilizing vector operations as in this study [28].…”
Section: Language Model -Structurementioning
confidence: 99%