2021
DOI: 10.48550/arxiv.2106.04140
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Broadcasted Residual Learning for Efficient Keyword Spotting

Abstract: Keyword spotting is an important research field because it plays a key role in device wake-up and user interaction on smart devices. However, it is challenging to minimize errors while operating efficiently in devices with limited resources such as mobile phones. We present a broadcasted residual learning method to achieve high accuracy with small model size and computational load. Our method configures most of the residual functions as 1D temporal convolution while still allows 2D convolution together using a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 24 publications
0
8
0
Order By: Relevance
“…Therefore, the kernels from the spatially separable convolution turn into the frequency-side kernel and temporal-side kernels. According to previous studies with the spatially separable convolution in audio fields, the keyword spotting [32] and ASC [11] tasks have been shown to be beneficial in modeling fewer parameters as well as for performance improvements.…”
Section: Feature Map Extractormentioning
confidence: 99%
“…Therefore, the kernels from the spatially separable convolution turn into the frequency-side kernel and temporal-side kernels. According to previous studies with the spatially separable convolution in audio fields, the keyword spotting [32] and ASC [11] tasks have been shown to be beneficial in modeling fewer parameters as well as for performance improvements.…”
Section: Feature Map Extractormentioning
confidence: 99%
“…In trigger words detection, false rejection rate (FRR) and false accepts per hour (FAs) are commonly used as the evaluation metrics [27,28,29,30]. In speech command recognition, accuracy is commonly used for evaluation [31,32,33]. CSKWS can borrow the evaluation metrics from these related tasks.…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…In general, KWS systems predetermine target keywords and are directly optimized for selected keywords. Although existing predefined KWS models show high detection performance [1,2,3], the necessity of a large dataset containing target keywords and inflexibility of changing target keywords hinder KWS models from expanding to various applications. When it comes to userdefined KWS, users can customize the target keywords with only a few enrollment samples [4,5,6,7] or in the form of string [8,9].…”
Section: Introductionmentioning
confidence: 99%