Binary Speech Features for Keyword Spotting Tasks

Riviello, Alexandre; David, Jean‐Pierre

doi:10.21437/interspeech.2019-1877

Cited by 7 publications

(11 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The same philosophy can be applied to speech features. Emerging research [69] studies two kinds of low-precision speech representations: linearly-quantized log-Mel spectrogram and power variation over time, derived from log-Mel spectrogram, represented by only 2 bits. Experimental results show that using 8-bit log-Mel spectra yields same KWS accuracy as employing full-precision MFCCs.…”

Section: Low-precision Featuresmentioning

confidence: 99%

“…Furthermore, KWS performance degradation is insignificant when exploiting 2bit precision speech features. As the authors of [69] state, this fact might indicate that much of the spectral information is superfluous when attempting to spot a set of keywords. In [82], we independently arrived at the same finding.…”

Section: Low-precision Featuresmentioning

confidence: 99%

“…Residual learning, proposed by He et al [126] for image recognition, is widely considered to implement state-of-theart acoustic models for deep KWS [30], [32], [50]- [52], [57], [67], [69], [78]. In short, residual learning models are constructed by introducing a series of shortcut connections linking non-consecutive layers (as exemplified by Figure 6), which helps to better train very deep CNN models.…”

Section: B Convolutional Neural Networkmentioning

confidence: 99%

“…Therefore, it is obvious that the non-streaming mode lacks some realism from a practical point of view. Despite this, isolated word classification is considered by a number of deep KWS works, e.g., [16], [30], [32], [48]- [52], [58], [69], [82], [89], [99], [109], [125], [128]- [130]. We believe that this is because of the simpler experimental framework with respect to that of the dynamic or streaming case.…”

Section: A Non-streaming Modementioning

confidence: 99%

“…The publicly available Google Speech Commands Dataset [153], [154] has become the de facto open benchmark for [30] for further details). Multiple recent deep KWS works have employed either the first version [16], [30], [32], [43], [48]- [52], [57], [58], [67], [69], [70], [86], [90], [100], [125] or the second version [32], [47], [48], [53], [70], [82], [89], [90], [99], [100], [109], [128]- [130], [159], [175] of the Google Speech Commands Dataset. Despite how valuable this open reference is for KWS research and development, we can raise two relevant points of criticism:…”

Section: A Google Speech Commands Datasetmentioning

confidence: 99%

See 4 more Smart Citations

Deep Spoken Keyword Spotting: An Overview

López-Espejo¹,

Hansen²,

Jensen³

2022

IEEE Access

View full text Add to dashboard Cite

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.INDEX TERMS Keyword spotting, deep learning, acoustic model, small footprint, robustness.

show abstract

Section: Low-precision Featuresmentioning

confidence: 99%

Section: Low-precision Featuresmentioning

confidence: 99%