In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons were also made between the IRT approach and two non-IRT approaches including the Livingston-Lewis and compound multinomial procedures. Results for various IRT model combinations were not substantially different. The estimated classification consistency and accuracy indices for the non-IRT procedures were almost always lower than those for the IRT procedures.
This article describes procedures for estimating various indices of classification consistency and accuracy for multiple category classifications using data from a single test administration. The estimates of the classification consistency and accuracy indices are compared under three different psychometric models: the two-parameter beta binomial, four-parameter beta binomial, and three-parameter logistic IRT (item response theory) models. Using real data sets, the estimation procedures are illustrated, and the characteristics of the estimated classification indices are examined. This article also examines the behavior of the estimated classification indices as a function of the latent variable. All three components of the models (i.e., the estimated true score distributions, fitted observed score distributions, and estimated conditional error variances) appear to have considerable influence on the magnitudes of the estimated classification indices. Choosing a model in practice should be based on various considerations including the degree of model fit to the data, suitability of the model assumptions, and the computational feasibility.
A new univariate sampling approach for bootstrapping correlation coefficients is proposed and evaluated. Bootstrapping correlations to define confidence intervals or to test hypotheses has previously relied on repeated bivariate sampling of observed (x,y) values to create an empirical sampling distribution. Bivariate sampling matches the logic of confidence interval construction, but hypothesis testing logic suggests that x and y should be sampled independently. This study uses Monte Carlo methods to compare the univariate bootstrap with 3 bivariate bootstrap procedures and with the traditional parametric procedure, using various sample sizes, population correlations, and population distributions. Results suggest that the univariate bootstrap is superior to other bootstrap procedures in many hypothesis testing settings, and even improves on parametric hypothesis testing in certain cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.