Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1759
|View full text |Cite
|
Sign up to set email alerts
|

On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification

Abstract: The task of personalized keyword detection system which also performs text dependent speaker verification (TDSV) has received substantial interest recently. Conventional approaches to this task involve the development of the TDSV and wakeup-word detection systems separately. In this paper, we show that TDSV and keyword spotting (KWS) can be jointly modeled using the convolutional long short term memory (CLSTM) model architecture, where an initial convolutional feature map is further processed by a LSTM recurre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 21 publications
(30 citation statements)
references
References 15 publications
0
29
1
Order By: Relevance
“…The presented model achieves 78.5 ± 2.3% accuracy on a 2way one-shot learning task, while typical state-of-the-art results are around 90% [5]. This is due to both the limited size of the network and the limited size of the dataset.…”
Section: Speaker Verificationmentioning
confidence: 87%
See 3 more Smart Citations
“…The presented model achieves 78.5 ± 2.3% accuracy on a 2way one-shot learning task, while typical state-of-the-art results are around 90% [5]. This is due to both the limited size of the network and the limited size of the dataset.…”
Section: Speaker Verificationmentioning
confidence: 87%
“…Recent studies tackle KWS for both efficient computation [4] and small footprint [6,7], while not many studies address these problems in the context of SV [3,2]. Efforts exist for either jointly solving both tasks [5] or solving a single task in the presence of background noise [10]. In contrast, this work proposes a solution that jointly solves both tasks in the harder scenario of competing talkers.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…From engineering point of view, speech recognition and speaker recognition are independent tasks. However, human brain interprets and decodes the information from speaker traits and linguistic content from the speech in joint corroborative manner [36,37]. Similarly, a unified framework for speaker and language recognition has been attempted using shared DNN that outperforms the single task implementation [38].…”
Section: Introductionmentioning
confidence: 99%