2014 IEEE Spoken Language Technology Workshop (SLT) 2014
DOI: 10.1109/slt.2014.7078612
|View full text |Cite
|
Sign up to set email alerts
|

EM-based phoneme confusion matrix generation for low-resource spoken term detection

Abstract: The idea of using a data-driven phoneme confusion matrix (PCM) to enhance speech recognition and retrieval performance is not new to the speech community. Although empirical results show various degrees of improvements brought by introducing a PCM, the underlying data-driven processes introduced in most papers are rather ad-hoc and lack rigorous statistical justifications. In this paper we will focus on the statistical aspects of PCM generation, propose and justify a novel expectationmaximization based algorit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 20 publications
0
6
0
Order By: Relevance
“…The goal of system combination [8,9,10] is to merge KWS results obtained on lattices generated by different underlying ASR systems. Another group is represented by score normalization methods, such as sum-to-one [9], query length normalization [10], keyword-specific thresholding [11] [12,13,14,15]. Finally, another group of postprocessing methods calibrates scores via optimizing the KWS metric [16,17,18,8].…”
Section: Related Workmentioning
confidence: 99%
“…The goal of system combination [8,9,10] is to merge KWS results obtained on lattices generated by different underlying ASR systems. Another group is represented by score normalization methods, such as sum-to-one [9], query length normalization [10], keyword-specific thresholding [11] [12,13,14,15]. Finally, another group of postprocessing methods calibrates scores via optimizing the KWS metric [16,17,18,8].…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, the subword-based approach has the unique advantage that it can detect terms that consist of words that are not in the vocabulary of the recognizer, i.e., out-ofvocabulary (OOV) terms. The combination of these two approaches has been proposed in order to exploit the relative advantages of word and subword-based strategies [17,32,33,36,44,[63][64][65][66][67][68][69][70].…”
Section: Spoken Term Detection Overviewmentioning
confidence: 99%
“…A standard tool used in spoken term detection and speech recognition for quantifying variation is the phone confusion matrix [14] [16][17][18][19] which captures the confusion statistics between phones thus providing a way of defining commonalities or groups [20][21][22]. However, a confusion matrix can suffer from data sparseness due to the fact that although some phones may be phonetically similar, only a small number of confusions may be found with one or more other phones.…”
Section: Introductionmentioning
confidence: 99%