“…In music, the most useful application is that of separating the lead vocals from a musical mixture. This problem is well researched and numerous deep learning based models have recently been proposed to tackle it [4,5,6,7,8,9,10,11]. Most of these models use the neural network to predict soft time frequency masks, given an input magnitude spectrogram of the mixture signal.…”