Voice activity detection in nonstationary noise

Tanyer, Suleyman Gokhun; Özer, Hamza

doi:10.1109/89.848229

Cited by 170 publications

(67 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In fact, when the signal of interest is not detected, the state of the system continuously iterates, in the FSM, between the coarse and fine processing states. 6 For small values of the SNR, the delay is around 4 s and it would not be possible to detect signals with duration shorter than this maximum delay-recall that the entire duration of the signal "slice" of interest is 8 s. This is due to the large number of frames which are processed before the presence of an atypical signal is declared in the coarse processing phase. For large values of the SNR, instead, in 0.5 s the signal of interest if correctly detected, thus making the proposed algorithm almost real-time.…”

Section: A Ideal Audio Signalsmentioning

confidence: 99%

“…Since the manufacturer provides the microphone characterization only for frequencies higher than 100 Hz, the microphone behavior is unpredictable for frequencies below this threshold, although the matching circuit performance is known in this band [12]. Therefore, the signal components are highly distorted and in our analysis, with "realistic" acquired signals, we neglect the signal contributions 6 One may consider a maximum number of iterations after which the system is reset. below 100 Hz.…”

Section: B Experimentally Acquired Audio Signalsmentioning

confidence: 99%

“…Unlike the previous problem of sound recognition, in this case one wants to detect the time intervals during which a (known) audio signal of interest (typically voice) appears, given that it will (sooner or later) appear for sure. A first possible strategy to detect the presence of an audio signal of interest, through a time domain-based analysis, consists of evaluating the energy of the audio signal samples, as in [6]. Another class of VAD algorithms is based on statistical analysis of the signal frames' spectra, obtained through the discrete Fourier transform (DFT) [7], as discussed in [8], [9].…”

mentioning

confidence: 99%

See 2 more Smart Citations

Low-Complexity Hybrid Time-Frequency Audio Signal Pattern Detection

2013

View full text Add to dashboard Cite

Abstract-In this paper, we present a low-complexity hybrid time-frequency approach for the detection of audio signal patterns by proper spectral signatures. The proposed detection algorithm evolves through two main processing phases, denoted as coarse and fine, respectively. The evolution through these two phases is described by a finite state machine model. The use of different processing phases is expedient to reduce the computational complexity and thus the energy consumption. Our results show that the proposed approach allows the efficient detection of the presence of signals of interest. The efficiency of the proposed detection algorithm is first investigated using "ideal" audio signals recovered from publicly available databases and then experimental audio signals acquired with a commercial microphone.Index Terms-Audio signal pattern detection, experimental validation, finite state machine (FSM), time-frequency processing.

show abstract

Section: A Ideal Audio Signalsmentioning

confidence: 99%

Section: B Experimentally Acquired Audio Signalsmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Low-Complexity Hybrid Time-Frequency Audio Signal Pattern Detection

2013

View full text Add to dashboard Cite

show abstract

“…Various VAD algorithms have been proposed in the literature, that are based on zero crossing rates, spectral representatives (LPC, LSF, etc. ), statistical speech and noise modeling [1], source separation, and decision-making based on a combination of different features [2]. The algorithms perform well in quiet or high SNR environments.…”

Section: Introductionmentioning

confidence: 99%

“…I(X, S) is large. 2 The system will however not work if there are any devices in the vicinity that specifically emit noise at 40Khz.…”

Section: Mutual Information Analysis Of the Doppler Sensormentioning

confidence: 99%

A robust voice activity detector using an acoustic Doppler radar

Raj²

2005

IEEE Workshop on Automatic Speech Recognition and Understanding, 2005.

View full text Add to dashboard Cite

This paper describes a robust voice activity detector using an acoustic Doppler radar device. The sensor is used to detect the dynamic status of the speaker's mouth. At the frequencies of operation, background noises are largely attenuated, rendering the device robust to external acoustic noises in most operating conditions. Unlike the other non-acoustic sensors, the device need not be taped to the speaker, making it more acceptable in most situations. In this paper, various fetures computed from the sensor output are exploited for voice activity detection. The best set of features is selected based on robustness analysis. A support vector machine classifier is used to make the final speech/non-speech decision. Experimental results show that the proposed doppler-based voice activity detector improves speech/non-speech classification accuracy over that obtained using speech alone. The most significant improvements happen in low signal-tonoise (SNR) environments. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. ABSTRACTThis paper describes a robust voice activity detector using an acoustic Doppler radar device. The sensor is used to detect the dynamic status of the speaker's mouth. At the frequencies of operation, background noises are largely attenuated, rendering the device robust to external acoustic noises in most operating conditions. Unlike the other non-acoustic sensors, the device need not be taped to the speaker, making it more acceptable in most situations. In this paper, various features computed from the sensor output are exploited for voice activity detection. The best set of features is selected based on robustness analysis. A support vector machine classifier is used to make the final speech/non-speech decision. Experimental results show that the proposed doppler-based voice activity detector improves speech/non-speech classification accuracy over that obtained using speech alone. The most significant improvements happen in low signal-to-noise (SNR) environments.

show abstract