Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2793
|View full text |Cite
|
Sign up to set email alerts
|

Detection and Recovery of OOVs for Improved English Broadcast News Captioning

Abstract: In this paper we present a study on building various deep neural network-based speech recognition systems for automatic caption generation that can deal with out-of-vocabulary (OOV) words. We develop several kinds of systems using various acoustic (hybrid, CTC, attention-based neural networks) and language modeling (n-gram and RNN-based neural networks) techniques on broadcast news. We discuss various limitations that the proposed systems have and introduce methods to effectively use them to detect OOVs. For a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 26 publications
0
9
0
Order By: Relevance
“…In another study, OOVs were detected by searching a confusion network [8]. After detection, recovery of the OOV word is generally performed [9,10]. These approaches can be used to recognize OOV words, but recovery often fails due to a lack of lexical information.…”
Section: Introductionmentioning
confidence: 99%
“…In another study, OOVs were detected by searching a confusion network [8]. After detection, recovery of the OOV word is generally performed [9,10]. These approaches can be used to recognize OOV words, but recovery often fails due to a lack of lexical information.…”
Section: Introductionmentioning
confidence: 99%
“…One workaround is to use a subword-based model, as they can theoretically create any word by outputting a sequence of shorter subword tokens [8,9,10]. Another approach is for the language model to contain a [unk] (unknown) token, which has as the pronunciation a phone LM trained on a lexicon of words with low counts, and then to try recover a word from the recognized phone sequence aligned with the [unk] token [11,12].…”
Section: Introductionmentioning
confidence: 99%
“…Many existing papers focusing on OOV recognition used private datasets, which makes results not comparable [2,5,8,11]. Or to create OOVs they keep the top ten thousand (or some other number that is significantly smaller than a real ASR system would use) in the vocabulary and use the rest as OOV words [4,5,10,8].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Usually when we are trying to recognize out of vocabulary word the system will try to cover it by a sequence of a shorter words acoustically similar to the origin word, this behaviour may sometimes lead to unwanted results. In some applications it is enough just to detect this occasions [6,7,8,9,10] while other applications require a mechanism to recover out of vocabulary words [11,12,13,14,15]. For the latter problem there are two major approaches.…”
Section: Introductionmentioning
confidence: 99%