Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-3132
|View full text |Cite
|
Sign up to set email alerts
|

Neural Architecture Search for Keyword Spotting

Abstract: Keyword spotting aims to identify specific keyword audio utterances. In recent years, deep convolutional neural networks have been widely utilized in keyword spotting systems. However, their model architectures are mainly based on off-the-shelf backbones such as VGG-Net or ResNet, instead of specially designed for the task. In this paper, we utilize neural architecture search to design convolutional neural network models that can boost the performance of keyword spotting while maintaining an acceptable memory … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
20
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 33 publications
(21 citation statements)
references
References 41 publications
1
20
0
Order By: Relevance
“…In contrast, the vast majority of previous NAS research has been focused on computer vision applications [21,22,23]. Existing NAS works in the speech community investigated non-TDNN based architectures [24,25,26,27,28,29].…”
Section: Introductionmentioning
confidence: 99%
“…In contrast, the vast majority of previous NAS research has been focused on computer vision applications [21,22,23]. Existing NAS works in the speech community investigated non-TDNN based architectures [24,25,26,27,28,29].…”
Section: Introductionmentioning
confidence: 99%
“…By controlling the standard deviation β of the noise, we can tune the searching algorithm to find a trade-off between the number of skip connections and its overall performance. NoisyDARTS finds the best model out of all three methods on V1 dataset, whose average number of parameters is nearly 8× fewer than the contemporary work NAS2 [20]. With much improved efficiency, it allows us to deploy our models on IoT devices with low computation consumption.…”
Section: Searching Resultsmentioning
confidence: 99%
“…Apart from our previous work NASC [16] adopting our two-stage one-shot NAS approach FairNAS [17] on acoustic scene classification, DARTS [11] has been also applied to speaker recognition in AutoSpeech [18], and to speech recognition in DARTS-ASR [19]. There is a noticeable contemporary work [20] also applying DARTS on KWS. However, due to the complex cell-based network topology, their searched networks might be limited for direct application on smart devices.…”
Section: Neural Architecture Search and Audiomentioning
confidence: 99%
“…Moving from fully-connected FFNN to CNN acoustic modeling was a natural step taken back in 2015 [28]. Thanks to exploiting local speech time-frequency correlations, CNNs are able to outperform, with fewer parameters, fully-connected FFNNs for acoustic modeling in deep KWS [28], [32], [72], [86], [96], [117], [122]- [125]. One of the attractive features of CNNs is that the number of multiplications of the model can be easily limited to meet the computational constraints by adjusting different hyperparameters like, e.g., filter striding, and kernel and pooling sizes.…”
Section: B Convolutional Neural Networkmentioning
confidence: 99%
“…Therefore, it is obvious that the non-streaming mode lacks some realism from a practical point of view. Despite this, isolated word classification is considered by a number of deep KWS works, e.g., [16], [30], [32], [48]- [52], [58], [69], [82], [89], [99], [109], [125], [128]- [130]. We believe that this is because of the simpler experimental framework with respect to that of the dynamic or streaming case.…”
Section: A Non-streaming Modementioning
confidence: 99%