2018 IEEE Spoken Language Technology Workshop (SLT) 2018
DOI: 10.1109/slt.2018.8639543
|View full text |Cite
|
Sign up to set email alerts
|

Information-Weighted Neural Cache Language Models for ASR

Abstract: Neural cache language models (LMs) extend the idea of regular cache language models by making the cache probability dependent on the similarity between the current context and the context of the words in the cache. We make an extensive comparison of 'regular' cache models with neural cache models, both in terms of perplexity and WER after rescoring first-pass ASR results. Furthermore, we propose two extensions to this neural cache model that make use of the content value/information weight of the word: firstly… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 11 publications
0
8
0
Order By: Relevance
“…Given that we used the negative logarithm, words with a higher frequency are reflected in lower scores. Word frequency was also calculated using the 5-gram model by Verwimp et al (2019).…”
Section: Methodsmentioning
confidence: 99%
“…Given that we used the negative logarithm, words with a higher frequency are reflected in lower scores. Word frequency was also calculated using the 5-gram model by Verwimp et al (2019).…”
Section: Methodsmentioning
confidence: 99%
“…Word surprisal was calculated as the negative logarithm of the conditional probability of the considered word given the four preceding words. This conditional probability was obtained with a 5-gram model created by Verwimp et al (2019) for the Dutch material and by Yilmaz et al (2016) for the Frisian material. Using these models, word frequency was obtained by calculating the negative probability of the uni-gram probability of the word.…”
Section: Methodsmentioning
confidence: 99%
“…When predicting the next word wt, the cache probability is computed based on the inner product between the current hidden state ht and the cache hidden state hj−1. Verwimp et al [21] assume that ht captures the previous context, and the cache probability based on the similarity between the current context and the context of cache words better influence the next word prediction.…”
Section: Embedding Extraction Methodsmentioning
confidence: 99%