Efficient target activity detection based on recurrent neural networks

Gerber, Dániel; Meier, Stefan; Kellermann, Walter

doi:10.1109/hscma.2017.7895559

Cited by 4 publications

(3 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RNNs take as input a sequence of vectors x = (x 0 , …, x T-1 ), compute a sequence of hidden-state vectors h = (h 0 , …, h T-1 ) and output a sequence y = (y 0 , …, y T-1 ) [51]. Inputs are processed in sequence, and the output of an individual step in the sequence y i depends not only on the corresponding input vector x i , but also on the hidden state at the previous step h i-1 [52]. The hidden state from the previous step allows the network to use dependencies among input vectors, e.g.…”

Section: F Recurrent Neural Networkmentioning

confidence: 99%

Identifying emergency stages in facebook posts of police departments with convolutional and recurrent neural networks and support vector machines

Pogrebnyakov

Maldonado

2017

2017 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

Classification of social media posts in emergency response is an important practical problem: accurate classification can help automate processing of such messages and help other responders and the public react to emergencies in a timely fashion. This research focused on classifying Facebook messages of US police departments. Randomly selected 5,000 messages were used to train classifiers that distinguished between four categories of messages: emergency preparedness, response and recovery, as well as general engagement messages. Features were represented with bag-of-words and word2vec, and models were constructed using support vector machines (SVMs) and convolutional (CNNs) and recurrent neural networks (RNNs). The best performing classifier was an RNN with a custom-trained word2vec model to represent features, which achieved the F1 measure of 0.839.

show abstract

Section: F Recurrent Neural Networkmentioning

confidence: 99%

Identifying emergency stages in facebook posts of police departments with convolutional and recurrent neural networks and support vector machines

Pogrebnyakov

Maldonado

2017

2017 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

show abstract

“…The VAD system addressed in these studies is trained so as to detect all speech segments regardless of the speaker, which is referred to as standard VAD. On the other hand, studies have also looked at a VAD system that detects the voice activity of a specific target speaker at the frame level, which is referred to as personalized VAD (PVAD) [17][18][19]. Compared with standard VAD, one strength of PVAD is that it does not detect the voice included in background noise that often leads to an unexpected response or an error of the downstream tasks in a real environment.…”

Section: Introductionmentioning

confidence: 99%

Enrollment-Less Training for Personalized Voice Activity Detection

Makishima¹,

Ihori²,

Tanaka³

et al. 2021

Interspeech 2021

View full text Add to dashboard Cite

We present a novel personalized voice activity detection (PVAD) learning method that does not require enrollment data during training. PVAD is a task to detect the speech segments of a specific target speaker at the frame level using enrollment speech of the target speaker. Since PVAD must learn speakers' speech variations to clarify the boundary between speakers, studies on PVAD used large-scale datasets that contain many utterances for each speaker. However, the datasets to train a PVAD model are often limited because substantial cost is needed to prepare such a dataset. In addition, we cannot utilize the datasets used to train the standard VAD because they often lack speaker labels. To solve these problems, our key idea is to use one utterance as both a kind of enrollment speech and an input to the PVAD during training, which enables PVAD training without enrollment speech. In our proposed method, called enrollment-less training, we augment one utterance so as to create variability between the input and the enrollment speech while keeping the speaker identity, which avoids the mismatch between training and inference. Our experimental results demonstrate the efficacy of the method.

show abstract

“…Recently, we proposed combining features for multichannel TAD with knowledge on the target source position by means of an ANN [25]- [27]. The concept was proposed for robot audition and offers the advantages that scattering effects at the robot's head can be learned by the ANN, and that a flexible definition of desired detection thresholds is possible.…”

Section: Introductionmentioning

confidence: 99%

Analysis of the robustness of neural network-based target activity detection

Meier

Gerber

Kellermann

2017

2017 25th European Signal Processing Conference (EUSIPCO)

Self Cite

View full text Add to dashboard Cite

Abstract-Many applications in audio signal processing require a precise identification of time frames where a predefined target source is active. In previous work, Artificial Neural Networks (ANNs) with crosscorrelation features showed a considerable potential in this field. In this paper, the performance of ANNbased target activity detection is analyzed in more detail and compared with a well-performing "classical" signal processing method. On the one hand, the impact of the angular distance between target source and interferers is evaluated for both the neural network-based method and the classical one. On the other hand, the sensitivity of both methods to varying Signal-to-Noise Ratio (SNR) conditions is analyzed with respect to the importance of a proper choice of detection thresholds. In the evaluations, the ANN-based method proves its general superiority and also its robustness with respect to a non-ideal choice of detection thresholds.

show abstract

Efficient target activity detection based on recurrent neural networks

Cited by 4 publications

References 32 publications

Identifying emergency stages in facebook posts of police departments with convolutional and recurrent neural networks and support vector machines

Identifying emergency stages in facebook posts of police departments with convolutional and recurrent neural networks and support vector machines

Enrollment-Less Training for Personalized Voice Activity Detection

Analysis of the robustness of neural network-based target activity detection

Contact Info

Product

Resources

About