“…Several systems consider only the magnitude spectrograms, such as [52], [140], [199], [204], while other consider only the phase spectrogram [128], [203] When considering both magnitude and phase, they can be stacked also in a third dimension (as well as channels). This representation has been employed in many neural-based SSL systems [41], [70], [131], [143], [147], [148], [152], [153], [187]. Other systems proposed to decompose the complexvalued spectrograms into real and imaginary parts [42], [119], [192], [205].…”