“…Word sequences are the most frequently used cue (e.g., Dai et al, 2020;Grau et al, 2004;Kalchbrenner & Blunsom, 2013;Stolcke et al, 2000), likely because many dialog act classification frameworks heavily rely on lexical information (Duran & Battle, 2018;Jurafsky et al, 1997;Louwerse & Crossley, 2006). In frequency-based and machine learning approaches, word sequences have typically been encoded using n-grams (e.g., Garner et al, 1996;Grau et al, 2004;Ribeiro et al, 2015;Louwerse & Crossley, 2006). Deep learning methods have typically used recurrent neural networks to encode such sequences (e.g., Dai et al, 2020;Ji et al, 2016;Tran et al, 2017b;Zhao & Kawahara, 2019).…”