Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2014
DOI: 10.3115/v1/d14-1095
|View full text |Cite
|
Sign up to set email alerts
|

Morphological Segmentation for Keyword Spotting

Abstract: We explore the impact of morphological segmentation on keyword spotting (KWS). Despite potential benefits, stateof-the-art KWS systems do not use morphological information. In this paper, we augment a state-of-the-art KWS system with sub-word units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morph… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0
1

Year Published

2015
2015
2023
2023

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 28 publications
(18 citation statements)
references
References 7 publications
0
17
0
1
Order By: Relevance
“…For example, the English word joblessness can be segmented as job+less+ness. When processing morphologically-rich languages, this helps reduce the sparsity created by the higher OOV rate due to productive morphology, and, empirically, has shown to be beneficial in a diverse variety of down-stream tasks, e.g., machine translation (Clifton and Sarkar, 2011), speech recognition (Afify et al, 2006), keyword spotting (Narasimhan et al, 2014) and parsing (Seeker and Özlem Çetinoğlu, 2015 and unsupervised approaches have been successful, but, when annotated data is available, supervised approaches typically greatly outperform unsupervised approaches (Ruokolainen et al, 2013).…”
Section: Morphological Segmentationmentioning
confidence: 99%
“…For example, the English word joblessness can be segmented as job+less+ness. When processing morphologically-rich languages, this helps reduce the sparsity created by the higher OOV rate due to productive morphology, and, empirically, has shown to be beneficial in a diverse variety of down-stream tasks, e.g., machine translation (Clifton and Sarkar, 2011), speech recognition (Afify et al, 2006), keyword spotting (Narasimhan et al, 2014) and parsing (Seeker and Özlem Çetinoğlu, 2015 and unsupervised approaches have been successful, but, when annotated data is available, supervised approaches typically greatly outperform unsupervised approaches (Ruokolainen et al, 2013).…”
Section: Morphological Segmentationmentioning
confidence: 99%
“…For detecting OOV keywords, one must either resort to (i) hybrid fuzzy phonetic search strategies, in which recognition is done in terms of words, but the search is done fuzzily based on phonetic strings, allowing for inexact matches [1,8,9,2], or, (ii) recognition in terms of shorter units [10,3,11,12,13,5,14] that have a higher chance of allowing a new word. In this paper, we focus mainly on the first category.…”
Section: Abstractpotting Pipelines For Dealing With Oov Abstractmentioning
confidence: 99%
“…When dealing with OOV keywords, things become more complicated; as demonstrated in the literature, [3,5], state-We thank all members of the BBN Speech and Language group for useful discussions, especially those on the Babel project. We also acknowledge the contribution of the BUT partners of the BABELON team who provided MLP acoustic features.…”
Section: Introductionmentioning
confidence: 99%
“…Morphological segmentation is useful for NLP applications, such as, automatic speech recognition (Afify et al, 2006), keyword spotting (Narasimhan et al, 2014), machine translation (Clifton and Sarkar, 2011) and parsing (Seeker and Ç etinoglu, 2015). Prior work cast the problem as surface segmentation: a word form w is segmented into a sequence of substrings whose concatenation is w. In this paper, we introduce the problem of canonical segmentation: w is analyzed as a sequence of canonical morphemes, based on a set of word forms that have been "canonically" annotated for supervised learning.…”
Section: Introductionmentioning
confidence: 99%