“…Regarding the features used for query/utterance representation, [5,[13][14][15] employ Gaussian posteriorgrams; [16] proposes an i-vector-based approach for feature extraction; [17] uses phone log-likelihood ratio-based features; [18] employs posteriorgrams derived from various unsupervised tokenizers, supervised tokenizers, and semi-supervised tokenizers; [19] employs posteriorgrams derived from a Gaussian mixture model (GMM) tokenizer, phoneme recognition, and acoustic segment modelling; [11,15,[20][21][22][23][24][25][26] use phoneme posteriorgrams; [11,[27][28][29] employ bottleneck features; [30] employs posteriorgrams from non-parametric Bayesian models; [31] employs articulatory class-based posteriorgrams; [32] proposes an intrinsic spectral analysis; and [33] is based on the unsupervised segment-based bag of an acoustic words framework. All these studies employ the standard DTW algorithm for query search, except for [13], which employs the NS-DTW algorithm, [15,24,25,28,30], which employ the subsequence DTW (S-DTW) algorithm, [14], which presents a variant of the S-DTW algorithm, and [26], which employs the segmental DTW algorithm.…”