Rating scales have no inherent reliability that is independent of the observers who use them. The often reported interrater reliability is an average of perhaps quite different individual rater reliabilities. It is possible to separate out the individual rater reliabilities given a number of independent raters who observe the same sample of ratees. Under certain assumptions, an external measure can replace one of the raters, and individual reliabilities of two independent raters can be estimated. In a somewhat similar fashion, estimates of treatment effects present in ratings by two independent raters can provide the external frame of reference against which differences in their individual reliabilities can be evaluated. Models for estimating individual rater reliabilities are provided for use in selecting, evaluating, and training participants in clinical research.Rating scales provide essential measurements for much clinical research. References to the reliability of a rating instrument are common; however, clinical rating scales do not have inherent reliability that is independent of the skill of the observers who use them. The fact that raters differ in the accuracy of their judgments, and in the consistency with which those judgments are recorded, is a major problem for clinical research. This paper describes several models that provide estimates of the reliability of a rating scale used by a particular individual rater. The use of such models may be helpful in selecting and training raters for participation in clinical research.Concern with interrater consistency has generally focused on the reliability of the rating instrument or procedure. Estimates of the reliability of mean ratings calculated across raters and estimates of the average reliability of the individual raters using an instrument have been provided (Armstrong, 1981). Haggard (1958) and Winer (1962) were among the first to introduce psychologists to the intraclass correlation coefficient as a measure of average interrater reliability, and Shrout and Fleiss (1979) have elaborated the statistical models underlying the intraclass correlation coefficient. Cronbach, Gleser, Nanda, and Rajaratnam (1972) extended the components of variance approach to complex experimental designs in which variability among different raters is one factor influencing &dquo;generalizability.&dquo; However, none of those authors discuss the problem of separating differing individual rater reliability as such. More recent authors have compared other approaches to estimating overall consistency among several raters as distinct from the specific reliabilities of individual raters ). Coefficient alpha can be calculated (for standardized ratings) from the correlation matrix relating ratings made by several raters; in that regard, it is more logically compared with the models for individual rater reliabilities discussed here than is the intraclass coefficient obtained from an analysis of variance model. This comparison emphasizes further the difference between average rater ...