This article presents methods for assessing agreement among the judgments made by a single group of judges on a single variable m regard to a single target For example, the group of judges could be editorial consultants, members of an assessment center, or members of a team. The single target could be a manuscript, a lower level manager, or a team. The variable on which the target is judged could be overall pubhshabihty in the case of the manuscript, managerial potential for the lower level manager, or team cooperativeness for the team The methods presented are based on new procedures for estimating interrater reliability For situations such as the above, these procedures are shown to furnish more accurate and mterpretable estimates of agreement than estimates provided by procedures commonly used to estimate agreement, consistency, or interrater reliability In addition, the proposed methods include processes for controlling for the spurious influences of response biases (e.g., positive leniency, social desirability) on estimates of interrater reliability.Many occasions arise in research and practice when it is useful to have an estimate of interrater reliability for judgments of a single target by one set of judges. Examples include the needs to estimate interrater reliability among judges' ratings of (a) the level of performance indicated by a potential anchor for a Behaviorally Anchored Rating Scale (BARS) in the development phases of a BARS, and (b) the overall "pubhshabihty" of a manuscript submitted for journal review. In these examples, the "variable" consists of a single item, with a rating scale such as a 7-point performance scale. It is also helpful to have an index of interrater reliability when scores on a variable consist of means taken over items that
Schmidt and Hunter (1989) critiqued the within-group interrater reliability statistic (& g ) described by James, Demaree, and Wolf (1984). Kozlowski and Hattrup (1992) responded to the Schmidt and Hunter critique and argued that <^ is a suitable index of interrater agreement. This article focuses on the interpretation of i^ as a measure of agreement among judges' ratings of a single target. A new derivation of ^ is given that underscores this interpretation.
generalization procedures are reviewed and found to be subject to the logical fallacy of affirming the consequent. Alternative models may explain variation in validity coefficients as well as the cross-situational consistency model espoused by many users of the validity generalization approach. Moreover, some of the assumptions that form the statistical foundation of validity generalization work are open to question. Use of Fisher 2 transformations in validity generalization analyses removes most of these problems and will usually produce more conservative estimates of the degree to which sampling error may account for variability in correlations.The hypothesis that validities are "situationally specific" in the context of personnel research has been addressed in a number of recent studies and reviews (cf.
A primary objective of validity generalization (VG) analysis is to decompose the between-situation variance in validities into (a) variance attributable to between-situation differences in statistical artifacts and (b) variance attributable to between-situation differences in (unidentified) situational moderators. This process is based on the assumption that the effects of statistical artifacts on validities are independent of the effects of situational moderators on validities. The present article seeks to question the independence assumption by theoretically integrating situational variables into the VG estimation process. It is shown that the independence assumption may be untenable because at least one artifact-criterion reliability-is a function of, rather than independent of, the situational variables that moderate validities. An alternative approach to VG analysis is recommended. This approach rests heavily on proactive research designs in which potential situational moderators are included in the generalizability analysis.We would like to thank Charles Glisson for his helpful suggestions and advice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.