N. Kingo Jacobsen scite author profile

2003

International Journal of Human-Computer Interaction

Computer professionals have a need for robust, easy-to-use usability evaluation methods (UEMs) to help them systematically improve the usability of computer artifacts. However, cognitive walkthrough (CW), heuristic evaluation (HE), and thinking-aloud study (TA)-3 of the most widely used UEMs-suffer from a substantial evaluator effect in that multiple evaluators evaluating the same interface with the same UEM detect markedly different sets of problems. A review of 11 studies of these 3 UEMs reveals that the evaluator effect exists for both novice and experienced evaluators, for both cosmetic and severe problems, for both problem detection and severity assessment, and for evaluations of both simple and complex systems. The average agreement between any 2 evaluators who have evaluated the same system using the same UEM ranges from 5% to 65%, and no 1 of the 3 UEMs is consistently better than the others. Although evaluator effects of this magnitude may not be surprising for a UEM as informal as HE, it is certainly notable that a substantial evaluator effect persists for evaluators who apply the strict procedure of CW or observe users thinking out loud. Hence, it is highly questionable to use a TA with 1 evaluator as an authoritative statement about what problems an interface contains. Generally, the application of the UEMs is characterized by (a) vague goal analyses leading to variability in the task scenarios, (b) vague evaluation procedures leading to anchoring, or (c) vague problem criteria leading to anything being accepted as a usability problem, or all of these. The simplest way of coping with the evaluator effect, which cannot be completely eliminated, is to involve multiple evaluators in usability evaluations.

show abstract

The Evaluator Effect: A Chilling Fact About Usability Evaluation Methods

Hertzum

¹

,

Jacobsen²

2001

International Journal of Human-Computer Interaction

View full text Add to dashboard Cite

Computer professionals have a need for robust, easy-to-use usability evaluation methods (UEMs) to help them systematically improve the usability of computer artifacts. However, cognitive walkthrough (CW), heuristic evaluation (HE), and thinking-aloud study (TA)-3 of the most widely used UEMs-suffer from a substantial evaluator effect in that multiple evaluators evaluating the same interface with the same UEM detect markedly different sets of problems. A review of 11 studies of these 3 UEMs reveals that the evaluator effect exists for both novice and experienced evaluators, for both cosmetic and severe problems, for both problem detection and severity assessment, and for evaluations of both simple and complex systems. The average agreement between any 2 evaluators who have evaluated the same system using the same UEM ranges from 5% to 65%, and no 1 of the 3 UEMs is consistently better than the others. Although evaluator effects of this magnitude may not be surprising for a UEM as informal as HE, it is certainly notable that a substantial evaluator effect persists for evaluators who apply the strict procedure of CW or observe users thinking out loud. Hence, it is highly questionable to use a TA with 1 evaluator as an authoritative statement about what problems an interface contains. Generally, the application of the UEMs is characterized by (a) vague goal analyses leading to variability in the task scenarios, (b) vague evaluation procedures leading to anchoring, or (c) vague problem criteria leading to anything being accepted as a usability problem, or all of these. The simplest way of coping with the evaluator effect, which cannot be completely eliminated, is to involve multiple evaluators in usability evaluations.

show abstract

Flower pigment composition of Crocus species and cultivars used for a chemotaxonomic investigation

Nørbæk¹,

Brandt²,

Nielsen

³

et al. 2002

Biochemical Systematics and Ecology

View full text Add to dashboard Cite

The Evaluator Effect in Usability Studies: Problem Detection and Severity Judgments

Jacobsen

¹

,

Hertzum

²

,

John

³

1998

Proceedings of the Human Factors and Ergonomics Society Annual

View full text Add to dashboard Cite

Usability studies are commonly used in industry and applied in research as a yardstick for other usability evaluation methods. Though usability studies have been studied extensively, one potential threat to their reliability has been left virtually untouched: the evaluator effect. In this study, four evaluators individually analyzed four videotaped usability test sessions. Only 20% of the 93 detected problems were detected by all evaluators, and 46% were detected by only a single evaluator. From the total set of 93 problems the evaluators individually selected the ten problems they considered most severe. None of the selected severe problems appeared on all four evaluators' top-10 lists, and 4 of the 11 problems that were considered severe by more than one evaluator were only detected by one or two evaluators. Thus, both detection of usability problems and selection of the most severe problems are subject to considerable individual variability.

show abstract