“…Such models are typically evaluated using Recall@k, a typical metric in information retrieval literature. This measures how often the correct response is identified as one of the top k ranked responses (Lowe et al, 2015;Inaba and Takahashi, 2016;Yu et al, 2016;Al-Rfou et al, 2016;Henderson et al, 2017;Lowe et al, 2017;Wu et al, 2017;Chaudhuri et al, 2018;Du and Black, 2018;Kumar et al, 2018;Zhou et al, 2018;Gunasekara et al, 2019;Tao et al, 2019). Models trained to select responses can be used to drive dialogue systems, question-answering systems, and response suggestion systems.…”