“…Since the early 2000s, several groups have built systems for scoring less constrained and more unpredictable speaking items, which incorporated additional sources of information for scoring, for example, diversity of vocabulary or grammatical complexity (Bernstein et al, ; Chen & Zechner, ; Strik, Van De Loo, Van Doremalen, & Cucchiarini, ; Yoon, Bhat, & Zechner, ; Zechner, Higgins, Xi, & Williamson, ). Recent work has also looked at evaluating the content relevance of spoken responses (Loukina, Zechner, & Chen, ; Somasundaran, Lee, Chodorow, & Wang, ; Xie, Evanini, & Zechner, ).…”