2008
DOI: 10.1145/2036916.2036917
|View full text |Cite
|
Sign up to set email alerts
|

Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval

Abstract: This article examines the use of statistically discovered morpheme-like units for Spoken Document Retrieval (SDR). The morpheme-like units ( morphs ) are used both for language modeling in speech recognition and as index terms. Traditional word-based methods suffer from out-of-vocabulary words. If a word is not in the recognizer vocabulary, any occurrence of the word in speech will be missing from the transcripts. The problem is especially severe for languages with a high number of dist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 36 publications
0
7
0
Order By: Relevance
“…This type of morphological analysis can be useful for alleviating language model sparsity inherent to morphologically rich languages (Hirsimäki et al, 2006;Turunen and Kurimo, 2011;Luong et al, 2013). Particularly, we focus on a low-resource learning setting, in which only a small amount of annotated word forms are available for model training, while unannotated word forms are available in abundance.…”
Section: Introductionmentioning
confidence: 99%
“…This type of morphological analysis can be useful for alleviating language model sparsity inherent to morphologically rich languages (Hirsimäki et al, 2006;Turunen and Kurimo, 2011;Luong et al, 2013). Particularly, we focus on a low-resource learning setting, in which only a small amount of annotated word forms are available for model training, while unannotated word forms are available in abundance.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, model hyperparameters can be selected to optimize segmentation performance, rather than some generative objective, such as likelihood. Special cases of hyperparameter selection include the weighted objective function (Kohonen, Virpioja, and Lagus 2010), data selection (Virpioja, Kohonen, and Lagus 2011;Sirts and Goldwater 2013), and grammar template selection (Sirts and Goldwater 2013). As for the weighted objective function and grammar template selection, the weights and templates are optimized to maximize segmentation accuracy.…”
Section: Discussionmentioning
confidence: 99%
“…As for the weighted objective function and grammar template selection, the weights and templates are optimized to maximize segmentation accuracy. Meanwhile, data selection is based on the observation that omitting some of the training data can improve segmentation accuracy (Virpioja, Kohonen, and Lagus 2011;Sirts and Goldwater 2013).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations