2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7952659
|View full text |Cite
|
Sign up to set email alerts
|

Deep neural network based wake-up-word speech recognition with two-stage detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(17 citation statements)
references
References 10 publications
0
17
0
Order By: Relevance
“…1) Experimental Setup: In contrast to the previous section, a fixed vocabulary KWS task is used in this section. A Mandarin dataset similar to [4] and [27] is used in the training stage. It consists of two parts: general speech corpus and keyword specific corpus.…”
Section: B Wakeup-word Recognition In Mandarinmentioning
confidence: 99%
“…1) Experimental Setup: In contrast to the previous section, a fixed vocabulary KWS task is used in this section. A Mandarin dataset similar to [4] and [27] is used in the training stage. It consists of two parts: general speech corpus and keyword specific corpus.…”
Section: B Wakeup-word Recognition In Mandarinmentioning
confidence: 99%
“…Comparing to threshold (Morgan et al, 1990;Naylor et al, 1992;Junkawitsch et al, 1997;Keshet et al, 2009;Wöllmer et al, 2009b,a;Li and Wang, 2014;Chen et al, 2014a;Gruenstein et al, 2017;Benisty et al, 2018;Myer and Tomar, 2018) Viterby (Rose and Paul, 1990;Feng and Mazor, 1992;Wilcox and Bush, 1992;Rohlicek et al, 1993;Knill and Young, 1996;Junkawitsch et al, 1997;Zheng et al, 1999;Liu et al, 2000;Vasilache and Vasilache, 2009;Tabibian et al, 2011;Leow et al, 2012;Zhu et al, 2013;Kumatani et al, 2017;Ge and Yan, 2017;Sun et al, 2017) Forward-Backward algorithm (Wilcox and Bush, 1992;Rohlicek et al, 1993) DTW (Zeppenfeld and Waibel, 1992;Kosonocky and Mammone, 1995;Kurniawati et al, 2012;Zehetner et al, 2014;Hou et al, 2016) Likelihood ratio (Jansen and Niyogi, 2009c;Szöke et al, 2010) Fuzzy logic (Manor and Greenberg, 2017) Table 10 The metrics used in studied sources.…”
Section: Decoding Approach Sourcesmentioning
confidence: 99%
“…FOM (Gish et al, 1990;Rose and Paul, 1990;Naylor et al, 1992;Zeppenfeld and Waibel, 1992;Chang and Lippmann, 1994;Gish and Ng, 1993;Rohlicek et al, 1993;Knill and Young, 1996;Junkawitsch et al, 1997;Zheng et al, 1999;Szöke et al, 2005;Lehtonen, 2005;Jansen and Niyogi, 2009a,c;Szöke et al, 2010;Tabibian et al, 2011;Bohac, 2012;Sangeetha and Jothilakshmi, 2014;Sadhu and Ghosh, 2017;Tabibian et al, 2018) EER (Szöke et al, 2010;Bohac, 2012) Accuracy (Morgan et al, 1990Ida and Yamasaki, 1998;Ge and Yan, 2017;Benisty et al, 2018;Fernández-Marqués et al, 2018) FA/kw/h (Rohlicek et al, 1989;Vroomen and Normandin, 1992;Feng and Mazor, 1992;Leow et al, 2012;Kavya and Karjigi, 2014) ROC (Marcus, 1992;Siu et al, 1994;Keshet et al, 2009;Wöllmer et al, 2009bWöllmer et al, , 2013Shokri et al, 2013;Sadhu and Ghosh, 2017;Kumatani et al, 2017) Detection rate (Feng and Mazor, 1992;Khne et al, 2004;…”
Section: Metrics Sourcesmentioning
confidence: 99%
“…In earlier research, a support vector machine (SVM) was used for the WUW recognition system [1]. Because the performance of deep neural network (DNN) systems has proven to be highly effective in many fields, there have been numerous efforts to build DNN-based WUW recognizers in various ways [2][3][4][5][6][7][8][9][10][11][12]. In [2], the bidirectional long short-term memory (BLSTM)-based end-to-end model was used to calculate the post-probability similar to the hybrid system, and the weighted finite-state transducers (WFSTs) were used to generate a confidence score from the calculated post-probability.…”
Section: Introductionmentioning
confidence: 99%