The most common measure of agreement for categorical data is the coefficient kappa. However, kappa performs poorly when the marginal distributions are very asymmetric, it is not easy to interpret, and its definition is based on hypothesis of independence of the responses (which is more restrictive than the hypothesis that kappa has a value of zero). This paper defines a new measure of agreement, delta, 'the proportion of agreements that are not due to chance', which comes from model of multiple-choice tests and does not have the previous limitations. The paper shows that kappa and delta generally take very similar values, except when the marginal distributions are strongly unbalanced. The case of the 2 x 2 tables (which admits very simple solutions) is considered in detail.
SUMMARYStatistical methods for carrying out asymptotic inferences (tests or confidence intervals) relative to one or two independent binomial proportions are very frequent. However inferences about a linear combination of K independent proportions L=β i p i (in which the first two are special cases) have had very little attention paid to them (focused exclusively on the classic Wald method). In this paper the authors approach the problem from the more efficient viewpoint of the score method, which can be solved using a free program which is available from the webpage quoted in the article. In addition the paper offers approximate formulas that are easy to calculate, gives a general proof of Agresti's heuristic method (consisting of adding a certain number of successes and failures to the original results before applying Wald's method) and, finally, it proves that the score method (which verifies the desirable properties of spatial and parametric convexity) is the best option in comparison with other methods.
When studying the degree of overall agreement between the nominal responses of two raters, it is customary to use the coefficient kappa. A more detailed analysis requires the evaluation of the degree of agreement category by category, and this is carried out in two different ways: using the value of kappa in the collapsed table for each category or using the agreement index for each category (proportion of agreements observed). Both indices have disadvantages: the former is sensitive to marginal totals; the latter is not chance corrected; and neither distinguishes the case where one of the two raters is a gold standard (an expert) from the case where neither rater is a gold standard. This article suggests five chance-corrected indices which are not sensitive to marginal totals and which differ depending on whether there is a standard rater. The article also justifies the reason for poor performance of kappa when the two marginal totals are unbalanced (especially if they are so in opposite directions) and the reason for its good performance when analysing the various 2 x 2 tables obtained by the collapse of a wider table.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.