Policy capturing is a widely used technique, but the temporal stability of policy-capturing judgments has long been a cause for concern. This article emphasizes the importance of reporting reliability, and in particular test-retest reliability, estimates in policy-capturing studies. We found that only 164 of 955 policy-capturing studies (i.e., 17.17%) reported a test-retest reliability estimate. We then conducted a reliability generalization meta-analysis on policy-capturing studies that did report test-retest reliability estimates—and we obtained an average reliability estimate of .78. We additionally examined 16 potential methodological and substantive antecedents to test-retest reliability (equivalent to moderators in validity generalization studies). We found that test-retest reliability was robust to variation in 14 of the 16 factors examined but that reliability was higher in paper-and-pencil studies than in web-based studies and was higher for behavioral intention judgments than for other (e.g., attitudinal and perceptual) judgments. We provide an agenda for future research. Finally, we provide several best-practice recommendations for researchers (and journal reviewers) with regard to (a) reporting test-retest reliability, (b) designing policy-capturing studies for appropriate reportage, and (c) properly interpreting test-retest reliability in policy-capturing studies.
Natural language processing (NLP) techniques are becoming increasingly popular in industrial and organizational psychology. One promising area for NLP-based applications is scale development; yet, while many possibilities exist, so far these applications have been restricted—mainly focusing on automated item generation. The current research expands this potential by illustrating an NLP-based approach to content analysis, which manually categorizes scale items by their measured constructs. In NLP, content analysis is performed as a text classification task whereby a model is trained to automatically assign scale items to the construct that they measure. Here, we present an approach to text classification—using state-of-the-art transformer models—that builds upon past approaches. We begin by introducing transformer models and their advantages over alternative methods. Next, we illustrate how to train a transformer to content analyze Big Five personality items. Then, we compare the models trained to human raters, finding that transformer models outperform human raters and several alternative models. Finally, we present practical considerations, limitations, and future research directions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.