“…This has had unforeseen benefits: first, the DSTC data now forms a sort of benchmark for the field, with groups continuing to report results on it after the challenge proper (Lee, 2013;Ma and Fosler-Lussier, 2014b;Zilka and Jurčíček, 2015;Fix and Frezza-Buet, 2015). In addition, the DSTC1-3 corpora have been used to examine which state tracking evaluation metrics correlate with dialog success (Lee, 2014), perform detailed error analyses of state trackers (Smith, 2014), and for dialog act classification and SLU experimentation (Ma and Fosler-Lussier, 2014a;Ferreira et al, 2015). We encourage future challenges to continue this tradition.…”