“…Other methods approach the task of ASL, which seeks to localize speakers spatially within the scene rather than classifying bounding box tracks [7,16,24,37,38,87,104,106]. Several use multichannel audio to incorporate directional audio information [7,16,24,37,38,104,106]. Recently,…”