Two experiments explored methods for standardizing ratings of the psychopathology of clinical case histories. In both experiments, the same case histories were rated as more pathological when mostly mild rather than severe cases were presented as the immediate context. Psychometric analyses demonstrated that this type of contextual effect is a potentially important source of unreliability in clinical judgment. In Experiment 1, increasing the number of points in the rating scale from 3 to either 7 or 100 significantly reduced the effects of the immediate context. Ratings were parsimoniously modeled by Parducci's (1983) range-frequency theory. In Experiment 2, providing verbal anchors in the form of either detailed DSM descriptions for each rating category or sample case histories for the two end-categories increased the reliability of the ratings by reducing the effects of the immediate contexts; however, these reductions occurred only when the ranges of the immediate contexts had been severely restricted. According to the range-frequency analysis, verbal anchors served to equate the endpoints of the subjective range for the different contextual conditions. Comparison with previous research suggests that the anchors also reduced the effects of the sequential position in which clinical cases appear. We therefore recommend that studies of the reliability of behavioral assessment techniques take into account the effects of differences in context.Although reliability does not imply validity, it is a necessary prerequisite. Unfortunately, clinical assessment in psychology is notoriously unreliable. Eysenck, Wakefield, and Friedman (1983) have documented the lack of interrater reliability. Goldberg (1968Goldberg ( , 1970 concluded that clinical judgments tend to be unreliable and only minimally related to the degree of experience of the person making the assessment. Arnoff (1954) reported an actual decrease in reliability with greater clinical experience.One generally accepted method for improving clinical prediction has been the use of statistical models. In particular, there is overwhelming evidence for the superiority of simple linear models over clinicians in combining information to form a diagnosis (Dawes, 1979;Dawes&Corrigan, 1974;Einhorn, 1972;Goldberg, 1968Goldberg, , 1970Meehl, 1954). But as Sawyer (1966) has pointed out, combining information is only the second half of the problem, the first half being the problem of measurement. Although there are a large number of objective measurement devices currently in use in clinical psychology, it is doubtful that the clinician will be supplanted in the near future as a primary source of information used in prediction. Therefore, development of greater accuracy of clinical judgments is of fundamenPreparation of this article was carried out while Douglas H.