We consider the task of speech source localization using binaural cues, namely interaural time and level difference (ITD & ILD). A typical approach is to process binaural speech using gammatone filters and calculate frame-level ITD and ILD in each subband. The ITD, ILD and their combination (ITLD) in each subband are statistically modelled using Gaussian mixture models for every direction during training. Given a binaural test-speech, the source is localized using maximum likelihood criterion assuming that the binaural cues in each subband are independent. We, in this work, investigate the robustness of each subband for localization and compare their performance against the full-band scheme with 32 gammatone filters. We propose a subband selection procedure using the training data where subbands are rank ordered based on their localization performance. Experiments on Subject 003 from the CIPIC database reveal that, for high SNRs, the ITD and ITLD of just one subband centered at 296Hz is sufficient to yield localization accuracy identical to that of the full-band scheme with a test-speech of duration 1sec. At low SNRs, in case of ITD, the selected subbands are found to perform better than the full-band scheme. Index Terms: gammatone filters, interaural time difference, interaural level difference DoA estimation consists of the following steps. First, the binaural speech is processed through a set of gammatone filters followed by frame-level ITD and ILD computation in each subband. These binaural features are then processed through GMMs trained on each subband for each direction. The direction with the maximum likelihood is the DoA estimate. We provide the details of these steps in the following subsections. 2.1. Gammatone Filters The binaural signals are processed through N =32 fourth order gammatone filters. Their center frequencies are equally dis
We derive an expression for the correction to the spin-system Hamiltonian that arises due to the system-bath interaction, starting both from the standard master equation for the spin density matrix and a perturbative diagonalization of the system-bath Hamiltonian to the second order in the interaction. We show that the dynamic frequency shifts observed in the evolution of the nuclear spin coherences are a result of these Hamiltonian corrections. We present a systematic decomposition of the relaxation superoperator into Hermitian and anti-Hermitian parts as opposed to the usual practice of partitioning it into real and imaginary parts. We point out that the relaxation-induced corrections to the coherent motion arise exclusively from the anti-Hermitian part and the dissipative effects, from the Hermitian part, both, in general, being complex. However, the secular terms of this correction are found to depend only on the imaginary and the real parts, respectively.
We consider the task of speech source localization from a binaural recording using interaural time difference (ITD). A typical approach is to process binaural speech using gammatone filters and calculate frame-level ITD in each subband. The ITDs in each gammatone subband are statistically modelled using Gaussian mixture models (GMMs) for every direction during training. Given a binaural test-speech, the source is localized using maximum likelihood (ML) criterion. In this work, we propose a subband weighting scheme where subband likelihoods are weighted based on their reliability. We measure the reliability of a subband using the average frame level localization error obtained for the respective subbands. These reliability values are used as the weights for each subband likelihood prior to combining the likelihoods for ML estimation. We also introduce non-linear warping of these weights to accommodate and analyse a larger space of possible subband weights. Experiments on Subject 003 from the CIPIC database reveal that weighting the subbands is better than the unweighted scheme of combining likelihoods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.