“…Additionally, for each architecture, we tune the model parameters such as the number of CNN, RNN, and FC layers (0 to 4) and nodes (in the set of [16,32,64,128,256,512]). The input sequence length is tuned in the set of [32,64,128,256,512], the DOA and SED branch output loss weights in the set of [1,5,50,500], the regularization (dropout in the set of [0, 0.1, 0.2, 0.3, 0.4, 0.5], L1 and L2 in the set of [0, 10 −1 ,10 −2 ,10 −3 ,10 −4 ,10 −5 ,10 −6 ,10 −7 ]) and the CNN max-pooling in the set of [2,4,6,8,16] for each layer. The best set of parameters are the ones which give the lowest SELD score on the three cross-validation splits of the dataset.…”