“…Other studies focused on the problem of speaker recognition without naming, using the speech modality as a single source of information. While some of these studies attempted to incorporate the visual modality, their goal was to cluster the speech segments rather than name the speakers arXiv:1809.08761v1 [cs.CL] 24 Sep 2018 (Erzin et al, 2005;Bost and Linares, 2014;Kapsouras et al, 2015;Bredin and Gelly, 2016;Hu et al, 2015;Ren et al, 2016). None of these studies used textual information (e.g., dialogue), which prevented them from identifying speaker names.…”