2020
DOI: 10.48550/arxiv.2010.15446
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Progressive Voice Trigger Detection: Accuracy vs Latency

Abstract: We present an architecture for voice trigger detection for virtual assistants. The main idea in this work is to exploit information in words that immediately follow the trigger phrase. We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision. However, waiting to listen to more audio each time incurs a latency increase. Progressive Voice Trigger Detection allows us to trade-off latency and accuracy by accepting clear trigger candidates… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 12 publications
(25 reference statements)
0
3
0
Order By: Relevance
“…1. Block diagrams of (a) conventional multi-task learning for KWS [20,21] and (b) our proposed approach. In the conventional approach , a last layer is simply split into two branches, one for phoneme prediction and one for phrase prediction.…”
Section: Overviewmentioning
confidence: 99%
See 2 more Smart Citations
“…1. Block diagrams of (a) conventional multi-task learning for KWS [20,21] and (b) our proposed approach. In the conventional approach , a last layer is simply split into two branches, one for phoneme prediction and one for phrase prediction.…”
Section: Overviewmentioning
confidence: 99%
“…In the multi-task learning framework, the model is trained using both phonetic loss and phrase loss [3,20,21]. Let us assume that we sample N utterances for a mini-batch from a combined set of an ASR dataset and a KWS dataset.…”
Section: Multi-task Learningmentioning
confidence: 99%
See 1 more Smart Citation