2010 IEEE International Conference on Acoustics, Speech and Signal Processing 2010
DOI: 10.1109/icassp.2010.5494968
|View full text |Cite
|
Sign up to set email alerts
|

Stochastic pronunciation modelling and soft match for out-of-vocabulary spoken term detection

Abstract: A major challenge faced by a spoken term detection (STD) system is the detection of out-of-vocabulary (OOV) terms. Although a subword-based STD system is able to detect OOV terms, performance reduction is always observed compared to in-vocabulary terms. One challenge that OOV terms bring to STD is the pronunciation uncertainty. A commonly used approach to address this problem is a soft matching procedure, and the other is the stochastic pronunciation modelling (SPM) proposed by the authors. In this paper we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2011
2011
2014
2014

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 18 publications
(20 citation statements)
references
References 11 publications
0
20
0
Order By: Relevance
“…Compared to our previous work [25], [26], we are now able to present a clearer understanding of the particular variation exhibited in pronunciations of OOV terms, including a subjective experiment to illustrate this; we also now propose a complete theory of stochastic pronunciation modelling and report more reliable experimental results than previously given.…”
Section: Motivationsmentioning
confidence: 87%
See 1 more Smart Citation
“…Compared to our previous work [25], [26], we are now able to present a clearer understanding of the particular variation exhibited in pronunciations of OOV terms, including a subjective experiment to illustrate this; we also now propose a complete theory of stochastic pronunciation modelling and report more reliable experimental results than previously given.…”
Section: Motivationsmentioning
confidence: 87%
“…Soft match is the most common technique for mitigating acoustic variation; it allows for some mismatch between the pronunciation predicted for the search term and the phoneme sequences in the lattice and typically involves a penalty based on either edit distance [13], [36], [37], acoustic confusion [21], [22], [24], [38] or model distance [39], [40]. Lexical deviation, however, has not been widely investigated until recently [3], [25].…”
Section: A Oov Uncertaintymentioning
confidence: 99%
“…Cambridge University's HTK is used to train the acoustic models and for lattice generation, and the SRI LM toolkit is used to train the LM. An enhanced joint-multigram model [16,54] trained with the AMI dictionary is used to predict pronunciations for the OOV terms. The Lattice2Multigram tool from Speech@FIT (Brno University of Technology) is used to search for detections within the phoneme lattices.…”
Section: Methodsmentioning
confidence: 99%
“…Additionally, certain letter-to-sound (LTS) approaches need to be designed for inferring pronunciations of OOV terms. For English, we employed an enhanced joint-multigram model (Deligne et al, 1995;Wang et al, 2009a) trained with the AMI dictionary for this purpose; for Spanish, we chose a simple mapping approach to derive the pronunciation. More detailed information about the experimental settings can be found in (Wang, 2009) and (Tejedor, 2009 …”
Section: System Configurationmentioning
confidence: 99%