“…We report the task's five automatic metrics: BLEU (Papineni et al, 2002), NIST (Doddington, 2002), METEOR (Lavie and Agarwal, 2007), ROUGE-L (Lin, 2004) and CIDEr (Vedantam et al, 2015). Table 1 compares the performance of our base S 0 and pragmatic models to the baseline T-Gen system (Dušek and Jurčíček, 2016) and the best previous result from the 20 primary systems evaluated in the E2E challenge (Dušek et al, 2018). The systems obtaining these results encompass a range of approaches: a template system (Puzikov and Gurevych, 2018), a neural model (Zhang et al, 2018), models trained with reinforcement learning (Gong, 2018), and systems using ensembling and reranking (Juraska et al, 2018).…”