2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012
DOI: 10.1109/icassp.2012.6288904
|View full text |Cite
|
Sign up to set email alerts
|

Investigations on exemplar-based features for speech recognition towards thousands of hours of unsupervised, noisy data

Abstract: The acoustic models in state-of-the-art speech recognition systems are based on phones in context that are represented by hidden Markov models. This modeling approach may be limited in that it is hard to incorporate long-span acoustic context. Exemplar-based approaches are an attractive alternative, in particular if massive data and computational power are available. Yet, most of the data at Google are unsupervised and noisy. This paper investigates an exemplar-based approach under this yet not well understood… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2013
2013
2016
2016

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 9 publications
0
12
0
Order By: Relevance
“…Whole-word approaches typically involve, at some level, template matching. For example, in template-based speech recognition [4,5], word scores are computed from dynamic time warping (DTW) distances between an observed segment and training segments of the hypothesized word. In queryby-example search, putative matches are typically found by measuring the DTW distance between the query and segments of the search database [6,7,8,9].…”
Section: Introductionmentioning
confidence: 99%
“…Whole-word approaches typically involve, at some level, template matching. For example, in template-based speech recognition [4,5], word scores are computed from dynamic time warping (DTW) distances between an observed segment and training segments of the hypothesized word. In queryby-example search, putative matches are typically found by measuring the DTW distance between the query and segments of the search database [6,7,8,9].…”
Section: Introductionmentioning
confidence: 99%
“…This data pruning reduced the database size or the model size by about 30%, and consequently saved the computation time and memory usage in speech recognition. In [21], exemplarbased word-level features were investigated for largescale speech recognition. These features were combined with the acoustic and language scores of the first-pass model through a segmental conditional random field to rescore word lattices.…”
Section: Related Workmentioning
confidence: 99%
“…However, they are still difficult to use in large vocabulary continuous speech recognition (LVCSR) due to their needs for intensive computing time and storage space. The newly proposed methods, such as template pruning and filtering [19], template-like dimension reduction of speech observations [20], and template matching in the second-pass decoding search [21], are beginning to address this problem. In general, there is a tradeoff between the costs in computation and space and the accuracy in recognition.…”
Section: Introductionmentioning
confidence: 99%
“…Data-driven automatic speech recognition (ASR) techniques (De Wachter et al, 2003;Aradilla et al, 2005;Deselaers et al, 2007;Sundaram and Bellegarda, 2012;Sainath et al, 2012;Heigold et al, 2012;Sun et al, 2014) became popular in the last decade as a viable alternative after the long dominance of statistical acoustic modeling in the form of the Gaussian mixture models (GMM) in hidden Markov models (HMM) (Bourlard et al, 1996). Templates or exemplars are labeled speech segments of multiple lengths extracted from training data, each associated with a certain class, i.e.…”
Section: Introductionmentioning
confidence: 99%