In this paper Jöreskog's model of congeneric tests is used to analyze agreement between raters. Raters are treated as measuring instruments. The model of congeneric tests, of which classical prallelism and tau-equivalence are shown to be special cases, is applied to teachers' ratings of students' responses on open-end questions. The findings suggest that the ratings are tau-equivalent and that the ratings given by the teachers are reliable.EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 1976, 36, 311-317. It is well known that the reliability of a test is usually estimated by correlating two sets of scores from equivalent forms, split halves, two administrations of the test, or from analysis of variance computations. When dealing with ratings, a special class of test scores, it may occur that more than two comparable sets are available. In a typical rating study different raters rate a number of subjects on a number of attributes or items. This situation frequently arises in educational and psychological research. In order to estimate the reliability of the ratings several procedures have been proposed, estimation procedures based upon analysis of variance (e.g., Ebel, 1951; Stanley, 1961;Maxwell and Pilliner, 1968) and the ones based upon a multivariate analysis of variance (e.g., Abelson, 1960;Fleiss, 1966) or a factoranalytic approach (e.g., La Forge, 1965). Apart from the problem of reliability of ratings, the problem of rater agreement is a pertinent one in view of the frequent use of the average of ratings as e.g. a criterion for validating a selection test. If we take the theoretical position that each rater may be regarded as a test instrument, then the problem of rater