Abstract-A novel method for the estimation of the distance of a sound source from binaural speech signals is proposed. The method relies on several statistical features extracted from such signals and their binaural cues. Firstly, the standard deviation of the difference of the magnitude spectra of the left and right binaural signals is used as a feature for this method. In addition, an extended set of additional statistical features that can improve distance detection is extracted from an auditory front-end which models the peripheral processing of the human auditory system. The method incorporates the above features into two classification frameworks based on Gaussian mixture models and Support Vector Machines and the relative merits of those frameworks are evaluated. The proposed method achieves distance detection when tested in various acoustical environments and performs well in unknown environments. Its performance is also compared to an existing binaural distance detection method.
For some time now, statistical analysis has been a valuable tool in analyzing room transfer functions (RTFs). This work examines existing statistical time-frequency models and techniques for RTF analysis (e.g., Schroeder's stochastic model and the standard deviation over frequency bands for the RTF magnitude and phase). RTF fractional octave smoothing, as with 1/3 octave analysis, may lead to RTF simplifications that can be useful for several audio applications, like room compensation, room modeling, auralisation purposes. The aim of this work is to identify the relationship of optimal response smoothing (e.g., as in complex smoothing) with respect to the original RTF statistics. More specifically, the RTF statistics, derived after the complex smoothing calculation, are compared to the original statistics across space inside typical rooms, by varying the source, the receiver position and the corresponding ratio of the direct and reverberant signal. In addition, this work examines the statistical quantities for speech and audio signals prior to their reproduction within rooms and when recorded in rooms. Histograms and other statistical distributions are used to compare RTF minima of typical “anechoic” and “reverberant” audio speech signals, in order to model the alterations due to room acoustics. The above results are obtained from both in-situ room response measurements and controlled acoustical response simulations.
Single-channel spectral subtraction algorithms are commonly used to suppress late reverberation. A binaural extension of such methods, apart from suppressing reverberation without introducing processing artifacts, should also preserve the signal's binaural localization cues. Here, three state-of-the-art spectral subtraction dereverberation algorithms are extended into a binaural context utilizing three alternative bilateral gain adaptation schemes and are compared to an extension derived from a Delay and Sum Beamformer. Objective results for several experimental conditions reveal the most prominent binaural extensions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.