“…There exists a large body of work in linguistics regarding different notions of coherence, such as the influence of coreference (Hobbs, 1979;Barzilay and Lapata, 2008, inter alia), Centering theory (Grosz et al, 1995), discourse structure (Mann and Thompson, 1987;Webber et al, 2003), and phenomena that connect utterances in dialogue, such as conversational maxims (Grice, 1975) or speaker interaction (Lascarides and Asher, 2009). Many of these are also mentioned by coherence evaluation studies, nonetheless they mostly revert to the use of some form of sentence-order variations (Chen et al, 2019;Moon et al, 2019;Mesgar et al, 2020). While some progress has been made towards incorporating more linguistically motivated test sets (Chen et al, 2019;Mohammadi et al, 2020;Pishdad et al, 2020), most evaluation studies focus on models trained specifically on coherence classification and prediction tasks.…”