Generalizability theory provides a comprehensive framework for determining how multiple sources of measurement error affect scores from psychological assessments and using that information to improve those assessments. Although generalizability theory designs have traditionally been analyzed using analyses of variance (ANOVA) procedures, the same analyses can be replicated and extended using structural equation models. We collected multi-occasion data from inventories measuring numerous dimensions of personality, self-concept, and socially desirable responding to compare variance components, generalizability coefficients, dependability coefficients, and proportions of universe score and measurement error variance using structural equation modeling versus ANOVA techniques. We further applied structural equation modeling techniques to continuous latent response variable metrics and derived Monte Carlo-based confidence intervals for those indices on both observed score and continuous latent response variable metrics. Results for observed scores estimated using structural equation modeling and ANOVA procedures seldom varied. Differences in reliability between raw score and continuous latent response variable metrics were much greater for scales with dichotomous responses, thereby highlighting the value of doing analyses on both metrics to evaluate gains that might be achieved by increasing response options. We provide detailed guidelines for applying the demonstrated techniques using structural equation modeling and ANOVA-based statistical software.
Over the last decade, applications of bifactor modeling within clinical settings have increased markedly but typically rely on data collected on single occasions. A shortcoming of such research is that reliability coefficients are likely inflated because key sources of measurement error are inadequately modeled and/or confounded with construct variance. We address these problems using three variations of multi-occasion bifactor models with Bayesian-derived parameter estimates to separate systematic variance into general and group factor effects and measurement error into three subcomponents (transient, specific-factor, and random-response). Collectively, these models produce indices of reliability and validity aligned with both standard confirmatory factor models and generalizability designs that extend interpretations of results to the broader domains from which items and occasions are sampled. We demonstrate how these techniques can provide new insights into psychometric properties of scores using Negative Emotionality domain and facet scales from the newly updated Big Five Inventory (BFI-2; Soto & John, 2017). Overall, the twooccasion congeneric bifactor model provided the best fit to the data and most informative indices for revising measures, examining dimensionality of composite and subscale scores, and evaluating the viability of those scores. We include code in R for analyzing all models in our extended online Supplemental Material.
Public Significance StatementWe demonstrate how multi-occasion bifactor models can be used to gauge effects of multiple sources of measurement error, revise measures to reduce such error, improve model fit, assess dimensionality and viability of scale scores, and interpret results from both factor analytic and generalizability theory perspectives. To take advantage of such potential benefits, we provide extensive online Supplemental Material with code in R to allow readers to apply all illustrated techniques.
In recent years, researchers have described how to analyze generalizability theory (GT) based univariate, multivariate, and bifactor designs using structural equation models. However, within GT studies of bifactor models, variance components have been limited to those reflecting relative differences in scores for norm-referencing purposes, with only limited guidance provided for estimating key indices when making changes to measurement procedures. In this article, we demonstrate how to derive variance components for multi-facet GT-based bifactor model designs that represent both relative and absolute differences in scores for norm- or criterion-referencing purposes using scores from selected scales within the recently expanded form of the Big Five Inventory (BFI-2). We further develop and apply prophecy formulas for determining how changes in numbers of items, numbers of occasions, and universes of generalization affect a wide variety of indices instrumental in determining the best ways to change measurement procedures for specific purposes. These indices include coefficients representing score generalizability and dependability; scale viability and added value; and proportions of observed score variance attributable to general factor effects, group factor effects, and individual sources of measurement error. To enable readers to apply these techniques, we provide detailed formulas, code in R, and sample data for conducting all demonstrated analyses within this article.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.