2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178990
|View full text |Cite
|
Sign up to set email alerts
|

Combination of search techniques for improved spotting of OOV keywords

Abstract: The most common pipelines in keyword spotting involve some kind of speech recognition, which leads to the generation of sets of plausible hypotheses (e.g., word lattices), which are subsequently explored. The case of out-of-vocabulary (OOV) keywords is of special interest, because it requires representing keywords and/or lattices in an alternative format, so that the two can match. A number of techniques for dealing with OOV keywords have appeared in the literature; here, we focus on (i) fuzzy-phonetic search … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 17 publications
0
5
0
Order By: Relevance
“…In the experiments, we observed that filtering the probability scores with a moving average filter of window width W and uniform weights 1 W improves the STD performance. The effect of this filtering is evaluated in Sec.…”
Section: Std Algorithmmentioning
confidence: 97%
See 1 more Smart Citation
“…In the experiments, we observed that filtering the probability scores with a moving average filter of window width W and uniform weights 1 W improves the STD performance. The effect of this filtering is evaluated in Sec.…”
Section: Std Algorithmmentioning
confidence: 97%
“…By recognizing and pre-indexing the spoken utterances the in-vocabulary (IV) queries could be directly found in the word index. The handling of out-ofvocabulary (OOV) terms consists of a much wider spectrum of methods [1] including recognition and indexing of sub-word units (phonemes, syllables or word fragments) [2,3,4], the use of IV proxy words [5,6] or the use of acoustic embeddings and similarity metrics in a vector space [7,8]. The acoustic embeddings often play a role also in the query-by-example (QbE) task in the low-resourced setup but the idea of neural-network-based projection of the query and the utterance into a single space could be reused in the more general STD task employing the standard speech recognition models [9,10,11].…”
Section: Introductionmentioning
confidence: 99%
“…The 1-best grapheme hypothesis H and the alignment A is used as pivot sequence in confusion network generation [15], i.e. the alternative graphemes are added parallel for each 1 Here we use the interval notation also for the integer intervals:…”
Section: Grapheme Confusion Networkmentioning
confidence: 99%
“…The spoken term detection (STD) task is a widely studied field of speech processing. The STD emerged as a variant of traditional keyword spotting which speeds up the search phase by offline pre-processing and indexing of the searched data [1], where the pre-processing costs are counterweighted by the speed of an online search. A conventional approach to STD is to use the DNN-HMM hybrid speech recognizer to transform the input audio data into a set of word lattices from which the inverted word index is built.…”
Section: Introductionmentioning
confidence: 99%
“…Score normalization is crucial for the right balance between true positives and false alarms. In this work, the raw scores are first normalized with a linear fit model, after which keyword-specific thresholding and exponential normalization (KST) is applied [23].…”
Section: Kws Systemmentioning
confidence: 99%