Towards acoustically robust localization of speakers in a reverberant environment

Rafaely, Boaz; Kolossa, Dorothea; Maymon, Yanir

doi:10.1109/hscma.2017.7895569

Cited by 9 publications

(6 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These parameters can be estimated using FOA signals. For example, the DOA can be estimated as originally suggested in [5] or using more advanced algorithms [23], [24], [32], and the DRR can be estimated using methods that were recently studied in the ACE challenge [33].…”

Section: Application To Directional Audio Codingmentioning

confidence: 99%

The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech

Madmoni

Tibor

Nelken

et al. 2021

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

The perception of sound in real-life acoustic environments, such as enclosed rooms or open spaces with reflective objects, is affected by reverberation. Hence, reverberation is extensively studied in the context of auditory perception, with many studies highlighting the importance of the direct sound for perception. Based on this insight, speech processing methods often use time-frequency (TF) analysis to detect TF bins that are dominated by the direct sound, and then use the detected bins to reproduce or enhance the speech signals. The detection of bins dominated by the direct sound is typically based on an objective measure, such as the direct-to-reverberant ratio (DRR). However, the relation between the DRR in the TF bins and the spatial perception of the reverberant sound which is reproduced from these bins is still not clear. It is the aim of this paper to provide some insights into this relation, specifically for reverberant speech, focusing on bins with high DRR. This is performed using a listening experiment, where high DRR bins within a reverberant speech signal have been masked in the TF domain, based on various DRR thresholds. The results show that the percentage of high-DRR TF bins that were masked may better indicate the quality of spatial perception, compared to the specific value of the DRR threshold. The insights from this work could be incorporated into spatial audio techniques that reproduce the direct sound of reverberant speech, and potentially improve spatial perception. This was illustrated with an implementation of directional audio coding that was studied with an additional listening experiment supporting the previously described results.

show abstract

Section: Application To Directional Audio Codingmentioning

confidence: 99%

The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech

Madmoni

Tibor

Nelken

et al. 2021

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Several approaches for estimating the speaker DoA from the selected bins have been proposed, including MUSIC with coherent and incoherent integration of the signal subspaces from the different bins [9], and bin-wise DoA estimation followed by statistical analysis to fuse the estimates [26]- [29].…”

Section: Application To Speaker Localizationmentioning

confidence: 99%

Focusing and Frequency Smoothing for Arbitrary Arrays With Application to Speaker Localization

Beit-On

Rafaely

2020

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

The coherent signal subspace method (CSSM) enables the direction-of-arrival (DoA) estimation of coherent sources with subspace localization methods. The focusing process that aligns the signal subspaces within a frequency band to its central frequency is central to the CSSM. Within current focusing approaches, a direction-independent focusing approach may be more suitable for reverberant environments since no initial estimation of the sources' DoAs is required. However, these methods use integrals over the steering function, and cannot be directly applied to arrays around complex scattering structures, such as robot heads. In this paper, current direction-independent focusing methods are extended to arrays for which the steering function is available only for selected directions, typically in a numerical form. Spherical harmonics decomposition of the steering function is then employed to formulate several aspects of the focusing error. A case of two coherent sources is studied and guidelines for the selection of the frequency smoothing bandwidth are suggested. The performance of the proposed methods is then investigated for an array that is mounted on a robot head. The focusing process is integrated within the directpath dominance (DPD) test method for speaker localization, originally designed for spherical arrays, extending its application to arrays with arbitrary configurations. Finally, experiments with real data verify the feasibility of the proposed method to successfully estimate the DoAs of multiple speakers under realworld conditions.

show abstract

“…In the case of a single source, the final DOA estimate can be computed as the mean of Ω coh . Alternatively, clustering the DOAs in Ω coh can be applied to eliminate outliers, or, in the case of multiple speakers, to estimate the DOA of each speaker [11], [32]. Because room reflections are coherent with the direct sound, bins that contain direct sound and reflections may still have a rank close to one, potentially degrading the performance of the coherence test, leading to errors under reverberation [11].…”

Section: A the Coherence Testmentioning

confidence: 99%

“…The aim of the analysis presented in this subsection is to provide further insight into the proposed test, by presenting the selected TF bins on top of speech spectrograms, using maps referred to as DPD maps [32]. Once again, the threshold for each test was chosen such that the percentage of TF bins that pass each test will be 10.5%, to support a common basis for comparison.…”

Section: Speech Spectrograms and Dpd Mapsmentioning

confidence: 99%

“…Several extensions have been developed for this method, including Gaussian mixture modeling (GMM) to eliminate outliers in the DOA estimation [14], and improve the robustness to challenging acoustic environments [15]. The computational cost of these methods is relatively high, since they require eigendecomposition of the spatial spectrum matrix at each TF bin.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Direction of Arrival Estimation for Reverberant Speech Based on Enhanced Decomposition of the Direct Sound

Madmoni

Rafaely

2019

IEEE J. Sel. Top. Signal Process.

Self Cite

View full text Add to dashboard Cite

Direction of arrival (DOA) estimation for speech sources is an important task in audio signal processing. This task becomes a challenge in reverberant environments, which are typical to real scenarios. Several methods of DOA estimation for speech sources have been developed recently, in an attempt to overcome the effect of reverberation. One effective approach aims to identify time-frequency bins in the short time Fourier transform domain that are dominated by the direct sound. This approach was shown to be particularly adequate for spherical arrays, with processing in the spherical harmonics domain. The direct-path dominance (DPD) test, and a method which is based on the directivity of the sound field are recent examples. While these methods seem to perform well, high reverberation conditions may degrade their performance. In this paper, the structure of the spatial correlation matrix is comprehensively studied, showing that under some well-defined conditions, the DOA of the direct sound can be correctly extracted from its dominant eigenvector, even when contaminated by reflections. This new insight leads to the development of a new test, performing an enhanced decomposition of the direct sound (EDS), denoted the DPD-EDS test. The proposed test is compared to previous DPD tests, and to other recently proposed reverberation-robust methods, using computer simulations and an experimental study, demonstrating its potential advantage. The studies include multiple speakers in highly reverberant environments, therefore representing challenging real-life acoustics scenes.

show abstract

Towards acoustically robust localization of speakers in a reverberant environment

Cited by 9 publications

References 13 publications

The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech

The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech

Focusing and Frequency Smoothing for Arbitrary Arrays With Application to Speaker Localization

Direction of Arrival Estimation for Reverberant Speech Based on Enhanced Decomposition of the Direct Sound

Contact Info

Product

Resources

About