Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) 2014
DOI: 10.3115/v1/w14-4341
|View full text |Cite
|
Sign up to set email alerts
|

Comparative Error Analysis of Dialog State Tracking

Abstract: A primary motivation of the Dialog State Tracking Challenge (DSTC) is to allow for direct comparisons between alternative approaches to dialog state tracking. While results from DSTC 1 mention performance limitations, an examination of the errors made by dialog state trackers was not discussed in depth. For the new challenge, DSTC 2, this paper describes several techniques for examining the errors made by the dialog state trackers in order to refine our understanding of the limitations of various approaches to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 7 publications
0
9
0
Order By: Relevance
“…To our knowledge only one example of a comparative error analysis of dialogue state trackers was given by Smith [12]. In that analysis the author reports the error distributions of tracking models over three error types of possible deviations from the true joint goal for every turn in the dialogue.…”
Section: Error Analysismentioning
confidence: 99%
“…To our knowledge only one example of a comparative error analysis of dialogue state trackers was given by Smith [12]. In that analysis the author reports the error distributions of tracking models over three error types of possible deviations from the true joint goal for every turn in the dialogue.…”
Section: Error Analysismentioning
confidence: 99%
“…As pointed out by Smith (2014), it is important to examine the types of errors made by a tracker in order to make improvements. To do this, at each turn, we compare the top dialog state output by each tracker with the true dialog state, and examine each slot.…”
Section: What Types Of Errors Do the Trackers Make?mentioning
confidence: 99%
“…Description based on survey collected from participants. (Kim and Banchs, 2014) Linear CRF team3 entry0 (Smith, 2014) Discourse rules + dialog act bigrams team4 entry2 (Henderson et al, 2014d) Recurrent neural network team6 entry2…”
Section: Entrymentioning
confidence: 99%
See 1 more Smart Citation
“…One such example can be "asian food" which appears 16 times in the training data as a part of the best ASR hypothesis while 13 times it really informs about "asian oriental" ontology value. Measurements on dstc2 dev have shown Williams (2014) .739 .721 Henderson et al (2014b) .737 .406 Knowledge-based tracker (Kadlec et al, 2014) .737 .429 √ et al (2014) .735 .433 Smith (2014) .729 .452 Lee et al (2014) .726 .427 YARBUS (Fix and Frezza-buet, 2015) . Table 1: Joint slot tracking accuracy and L2 (denotes the squared L2 norm between the estimated belief distribution and correct distribution) for various systems reported in the literature.…”
Section: Lessons Learnedmentioning
confidence: 99%