2006
DOI: 10.1162/coli.2006.32.2.263
|View full text |Cite
|
Sign up to set email alerts
|

The PARADISE Evaluation Framework: Issues and Findings

Abstract: There has been a great deal of interest over the past 20 years in developing metrics and frameworks for evaluating and comparing the performance of spoken-language dialogue systems. One of the results of this interest is a potential general methodology, known as the PARADISE framework. This squib highlights some important issues concerning the application of PARADISE that have, up to now, not been sufficiently emphasized or have even been neglected by the dialogue-system community. These include considerations… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
25
0

Year Published

2007
2007
2019
2019

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 39 publications
(26 citation statements)
references
References 7 publications
1
25
0
Order By: Relevance
“…(See [70] for criticisms.) Equations were developed to predict dialog efficiency (which depends on mean elapsed time and the mean number of user moves) and dialog quality costs (which depends on the number of missing responses, the number of errors, and many other factors, and task success, measured by the Kappa coefficient and defined below):…”
Section: What Should Be Measured?mentioning
confidence: 99%
“…(See [70] for criticisms.) Equations were developed to predict dialog efficiency (which depends on mean elapsed time and the mean number of user moves) and dialog quality costs (which depends on the number of missing responses, the number of errors, and many other factors, and task success, measured by the Kappa coefficient and defined below):…”
Section: What Should Be Measured?mentioning
confidence: 99%
“…On the one hand, this could indicate that the selected individual user-satisfaction measures really measure the performance of the dialogue manager and consequently illustrate the obvious difference between both dialogue-management manners. On the other hand, one could argue that this simply means that the individual user-satisfaction measures are not appropriate measures of attitude because people are likely to vary in the way they interpret the item wording [9]. Though, due to the huge difference in significance, this seems an unlikely explanation.…”
Section: Performance Function Resultsmentioning
confidence: 99%
“…Some important PARADISE details and issues were, however, highlighted by Hajdinjak and Mihelič [9]. Applying PARADISE to dialogue data requires the dialogue corpora to be collected via controlled experiments during which users subjectively rate their satisfaction.…”
Section: Introductionmentioning
confidence: 99%
“…Hassel and Hagen [7] investigated feature selection and introduced indirect parameters into AVMs to overcome the drawbacks of PARADISE. Hajdinjak and Miheliĉ [8] discussed regression parameter selection and related normalization.…”
Section: Related Workmentioning
confidence: 99%