2015 23rd European Signal Processing Conference (EUSIPCO) 2015
DOI: 10.1109/eusipco.2015.7362666
|View full text |Cite
|
Sign up to set email alerts
|

Query by example search with segmented dynamic time warping for non-exact spoken queries

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 10 publications
0
8
0
Order By: Relevance
“…Regarding the features used for query/utterance representation, Gaussian posteriorgrams are employed in [22,29,40,41]; an i-vector-based approach for feature extraction is proposed in [42]; phone log-likelihood ratio-based features are used in [43]; posteriorgrams derived from various unsupervised tokenizers, supervised tokenizers, and semi-supervised tokenizers are employed in [44]; and posteriorgrams derived from a Gaussian mixture model (GMM) tokenizer, phoneme recognition, and acoustic segment modeling are used in [45]. Phoneme posteriorgrams have been widely used [34,41,[46][47][48][49][50][51][52][53][54] and bottleneck features as well [34,[55][56][57][58][59][60].…”
Section: Methods Based On Template Matchingmentioning
confidence: 99%
“…Regarding the features used for query/utterance representation, Gaussian posteriorgrams are employed in [22,29,40,41]; an i-vector-based approach for feature extraction is proposed in [42]; phone log-likelihood ratio-based features are used in [43]; posteriorgrams derived from various unsupervised tokenizers, supervised tokenizers, and semi-supervised tokenizers are employed in [44]; and posteriorgrams derived from a Gaussian mixture model (GMM) tokenizer, phoneme recognition, and acoustic segment modeling are used in [45]. Phoneme posteriorgrams have been widely used [34,41,[46][47][48][49][50][51][52][53][54] and bottleneck features as well [34,[55][56][57][58][59][60].…”
Section: Methods Based On Template Matchingmentioning
confidence: 99%
“…Regarding the features used for query/utterance representation, [5,[13][14][15] employ Gaussian posteriorgrams; [16] proposes an i-vector-based approach for feature extraction; [17] uses phone log-likelihood ratio-based features; [18] employs posteriorgrams derived from various unsupervised tokenizers, supervised tokenizers, and semi-supervised tokenizers; [19] employs posteriorgrams derived from a Gaussian mixture model (GMM) tokenizer, phoneme recognition, and acoustic segment modelling; [11,15,[20][21][22][23][24][25][26] use phoneme posteriorgrams; [11,[27][28][29] employ bottleneck features; [30] employs posteriorgrams from non-parametric Bayesian models; [31] employs articulatory class-based posteriorgrams; [32] proposes an intrinsic spectral analysis; and [33] is based on the unsupervised segment-based bag of an acoustic words framework. All these studies employ the standard DTW algorithm for query search, except for [13], which employs the NS-DTW algorithm, [15,24,25,28,30], which employ the subsequence DTW (S-DTW) algorithm, [14], which presents a variant of the S-DTW algorithm, and [26], which employs the segmental DTW algorithm.…”
Section: Methods Based On Template Matching Of Featuresmentioning
confidence: 99%
“…In the 2014 Queryby-Example Speech Search Task (QUESST) [11], one task was non-exact matching, in which test occurrences could contain small morphological variations with regard to the lexical form of the query. To solve this problem, Xu et al [12] proposed a partial matching strategy in which all partial phone sequences of a query were used to search for matching instances; Proenga et al [13] some of these works attempted to solve the non-exact matching problem, they all used DTW-based matching on frame-level representations, which has been shown to be outperformed by distance-based matching on acoustic word embeddings [2,3,4].…”
Section: Related Workmentioning
confidence: 99%