“…At the same time, the field of visually assisted source separation has emerged [10], [44], [45], [46], [47], in particular, with explicit focus on musical data [2], [7], [8], [9], [10]. Starting with capturing only visual appearance features [7], [8], [9], [10] there is a shift towards capturing and integrating motion data [2].…”