2017 Hands-Free Speech Communications and Microphone Arrays (HSCMA) 2017
DOI: 10.1109/hscma.2017.7895559
|View full text |Cite
|
Sign up to set email alerts
|

Efficient target activity detection based on recurrent neural networks

Abstract: This paper addresses the problem of Target Activity Detection (TAD) for binaural listening devices. TAD denotes the problem of robustly detecting the activity of a target speaker in a harsh acoustic environment, which comprises interfering speakers and noise ('cocktail party scenario'). In previous work, it has been shown that employing a Feed-forward Neural Network (FNN) for detecting the target speaker activity is a promising approach to combine the advantage of different TAD features (used as network inputs… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 32 publications
0
3
0
Order By: Relevance
“…RNNs take as input a sequence of vectors x = (x 0 , …, x T-1 ), compute a sequence of hidden-state vectors h = (h 0 , …, h T-1 ) and output a sequence y = (y 0 , …, y T-1 ) [51]. Inputs are processed in sequence, and the output of an individual step in the sequence y i depends not only on the corresponding input vector x i , but also on the hidden state at the previous step h i-1 [52]. The hidden state from the previous step allows the network to use dependencies among input vectors, e.g.…”
Section: F Recurrent Neural Networkmentioning
confidence: 99%
“…RNNs take as input a sequence of vectors x = (x 0 , …, x T-1 ), compute a sequence of hidden-state vectors h = (h 0 , …, h T-1 ) and output a sequence y = (y 0 , …, y T-1 ) [51]. Inputs are processed in sequence, and the output of an individual step in the sequence y i depends not only on the corresponding input vector x i , but also on the hidden state at the previous step h i-1 [52]. The hidden state from the previous step allows the network to use dependencies among input vectors, e.g.…”
Section: F Recurrent Neural Networkmentioning
confidence: 99%
“…The VAD system addressed in these studies is trained so as to detect all speech segments regardless of the speaker, which is referred to as standard VAD. On the other hand, studies have also looked at a VAD system that detects the voice activity of a specific target speaker at the frame level, which is referred to as personalized VAD (PVAD) [17][18][19]. Compared with standard VAD, one strength of PVAD is that it does not detect the voice included in background noise that often leads to an unexpected response or an error of the downstream tasks in a real environment.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, we proposed combining features for multichannel TAD with knowledge on the target source position by means of an ANN [25]- [27]. The concept was proposed for robot audition and offers the advantages that scattering effects at the robot's head can be learned by the ANN, and that a flexible definition of desired detection thresholds is possible.…”
Section: Introductionmentioning
confidence: 99%