Permutation tests are a paradox of old and new. Permutation tests pre-date most traditional parametric statistics, but only recently have become part of the mainstream discussion regarding statistical testing. Permutation tests follow a permutation or 'conditional on errors' model whereby a test statistic is computed on the observed data, then (1) the data are permuted over all possible arrangements of the data-an exact permutation test; (2) the data are used to calculate the exact moments of the permutation distribution-a moment approximation permutation test; or (3) the data are permuted over a subset of all possible arrangements of the data-a resampling approximation permutation test. The earliest permutation tests date from the 1920s, but it was not until the advent of modern day computing that permutation tests became a practical alternative to parametric statistical tests. In recent years, permutation analogs of existing statistical tests have been developed. These permutation tests provide noteworthy advantages over their parametric counterparts for small samples and populations, or when distributional assumptions cannot be met. Unique permutation tests have also been developed that allow for the use of Euclidean distance rather than the squared Euclidean distance that is typically employed in parametric tests. This overview provides a chronology of the development of permutation tests accompanied by a discussion of the advances in computing that made permutation tests feasible. Attention is paid to the important differences between 'population models' and 'permutation models', and between tests based on Euclidean and squared Euclidean distances.
Five procedures to calculate the probability of weighted kappa with multiple raters under the null hypothesis of independence are described and compared in terms of accuracy, ease of use, generality, and limitations. The five procedures are (1) exact variance, (2) resampling contingency, (3) intraclass correlation, (4) randomized block, and (5) resampling block. While each procedure possesses strengths and limitations, the resampling contingency procedure is shown to be the most versatile and accurate of the five procedures, provided the number of raters is not too large. The resampling contingency procedure permits any weighting scheme, accommodates both symmetrical and asymmetrical weights, is suitable for both weighted and unweighted kappa, and makes no assumptions about either the data distribution or the probability distribution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.