“…The present paper has a strong connection to recent work on unsupervised speech processing, especially the Zerospeech 2015 (Versteegh et al, 2015) and 2017 (Dunbar et al, 2017) shared tasks. Participating systems (Badino et al, 2015;Renshaw et al, 2015;Agenbag and Niesler, 2015;Baljekar et al, 2015;Räsänen et al, 2015;Lyzinski et al, 2015;Zeghidour et al, 2016;Heck et al, 2016;Srivastava and Shrivastava, 2016;Kamper et al, 2017b;Yuan et al, 2017;Heck et al, 2017;Shibata et al, 2017;Ansari et al, 2017a,b) perform unsupervised ABX discrimination and/or spoken term discovery on the basis of unlabeled speech alone. The design and evaluation of these and related systems (Kamper et al, , 2017aElsner and Shain, 2017;Räsänen et al, 2018) are oriented toward word-level modeling.…”