“…Conversational Speech Translation Work on conversational ST (Kumar et al, 2014b,a;Zanon Boito et al, 2022) has mainly focused on single-speaker speech, either segmented manually or automatically, via voice activity detection. Manual segmentation was assumed in recent studies, based on the Fisher and CALLHOME corpora, on cascaded ST (Kumar et al, 2014b), E2E-ST (Weiss et al, 2017;Peng et al, 2023), simultaneous ASR & ST (Soky et al, 2022), streamed ST (Deng et al, 2022), and multilingual ST (Inaguma et al, 2019).…”