Mokken scale analysis is a popular method for scaling dichotomous and polytomous items. Whether or not items form a scale is determined by three types of scalability coefficients: (1) for pairs of items, (2) for items, and (3) for the entire scale. It has become standard practice to interpret the sample values of these scalability coefficients using Mokken’s guidelines, which have been available since the 1970s. For valid assessment of the scalability coefficients, the standard errors of the scalability coefficients must be taken into account. So far, standard errors were not available for scales consisting of Likert items, the most popular item type in sociology, and standard errors could only be computed for dichotomous items if the number of items was small. This study solves these two problems. First, we derived standard errors for Mokken’s scalability coefficients using a marginal modeling framework. These standard errors can be computed for all types of items used in Mokken scale analysis. Second, we proved that the method can be applied to scales consisting of large numbers of items. Third, we applied Mokken scale analysis to a set of polytomous items measuring tolerance. The analysis showed that ignoring standard errors of scalability coefficients might result in incorrect inferences.
We discuss the statistical testing of three relevant hypotheses involving Cronbach's alpha: one where alpha equals a particular criterion; a second testing the equality of two alpha coefficients for independent samples; and a third testing the equality of two alpha coefficients for dependent samples. For each of these hypotheses, various statistical tests have been proposed. Over the years, these tests have depended on progressively fewer assumptions. We propose a new approach to testing the three hypotheses that relies on even fewer assumptions, is especially suited for discrete item scores, and can be applied easily to tests containing large numbers of items. The new approach uses marginal modelling. We compared the Type I error rate and the power of the marginal modelling approach to several of the available tests in a simulation study using realistic conditions. We found that the marginal modelling approach had the most accurate Type I error rates, whereas the power was similar across the statistical tests.
Mokken scale analysis uses three types of scalability coefficients to assess the quality of (a) pairs of items, (b) individual items, and (c) an entire scale. Both the point estimates and the standard errors of the scalability coefficients assume that the sample ordering of the item steps is identical to the population ordering, but due to sampling error, the sample ordering may be incorrect and, consequently, the estimates and the standard errors may be biased. Two simulation studies were used to investigate the bias of the estimates and the standard errors of the scalability coefficients, as well as the coverage of the 95% confidence intervals. Distance between item steps was the most important design factor. In addition, sample size, number of items, number of answer categories, and item discrimination were included in the design. Bias of the standard errors was negligible. Bias of the estimates was largest when all item steps were identical in the population, especially for small sample sizes. Furthermore, bias of the estimates decreased as number of answer categories increased and as item discrimination decreased. Coverage of the 95% confidence intervals was close to .950, but for small sample size coverage deteriorated. Coverage also became poorer as number of items increased, in particular for dichotomous items.
Mixture models have been developed to enable detection of within-subject differences in responses and response times to psychometric test items. To enable mixture modeling of both responses and response times, a distributional assumption is needed for the within-state response time distribution. Since violations of the assumed response time distribution may bias the modeling results, choosing an appropriate within-state distribution is important. However, testing this distributional assumption is challenging as the latent within-state response time distribution is by definition different from the observed distribution. Therefore, existing tests on the observed distribution cannot be used. In this article, we propose statistical tests on the within-state response time distribution in a mixture modeling framework for responses and response times. We investigate the viability of the newly proposed tests in a simulation study, and we apply the test to a real data set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.