“…They have been used with various inputs: binaural features [24], GCC features [25], the eigenvectors of the spatial covariance matrix [26], raw short-time Fourier transform (STFT) of signals [27]- [29], including for Ambisonics signals in [29]. Different architectures have been tested: feed-forward neural networks [24], convolutional neural networks (CNNs) [27], [30], deep residual networks [31], convolutional and recurrent networks (CRNNs) [29]. Yet, most of these methods have only been evaluated in simulated environments similar to the training conditions, which is not sufficient to verify their generalization to real-life applications.…”