The z-test based on the Kappa statistic is commonly used to infer superiority of one map production method over another. Typically the same reference data set is used to calculate and next compare the Kappa's of the two maps. This data structure easily leads to dependence between the two error-matrices. This may result in overly large variance estimates and too conservative inference about the difference in accuracy between the two methods. Tests considering the dependency between the error matrices would be more sensitive in such case. In this article we compare the performance of two such tests, a randomization and McNemar's test, with the traditional z-test. We compared 16 alternative methods to classify salt marsh vegetation in The Netherlands. The error matrices were positively associated in all 120 possible comparisons of pairs of classification methods. This suggests that dependency between pairs of error matrices used in classifier comparison is a common phenomenon. Both the randomization and McNemar test gave lower p values and rejected the null hypothesis of equal performance more frequently than the z-test. We therefore recommend considering their use.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.