into 20 units by which he transmutes the graphic ratings into scores. Although he states it nowhere definitely, there is the implication that by making the scale less "coarse" there is an increase in accuracy. Cady ('23), using the graphic rating device uses a scale of twenty-two intervals because "this method permits more elasticity of placement and readier translation into statistical values than that of two or three points as deviation from a mean."Rating scales using preformed scale divisions vary all the way from two scale divisions up. Rating scales with only two steps have usually assumed the presence or absence of the trait. An odd number of intervals has generally been chosen, for no other reason that the writer can think of than that one usually wants an average position, disliking to make the judgment as to being better or worse than the average in close decisions. Galton ('83) used a nine point scale, Pearson ('07) used a scale of seven classes, Webb ('15) used a scale of seven classes, Downey ('19) used a scale of eleven classes, Plant ('22) made use of a ten scale class, ivfendenhall ('22) has constructed rating scales with six to eight classes respec-tively, the Army Rating Scale contained five classes. Appar-ently the construction of rating scales has proceeded quite without consideration as to the reason for constructing scales with one rather than another number of classes. This matter hinges on the statistical question as to how much reliability is lowered due to "coarseness of grouping."
ON THE LOSS OF RELIABILITY IN RATINGS DUETO COARSENESS OF THE SCALE
This paper proposes to list and discuss the factors which influence the reliability of tests. Were psychologists more conscious of what it is that makes a test reliable, fewer blunders would be made in devising tests which have low and unsatisfactory reliability. The development of the natural sciences depended on the development of exact measurements, and the development of psychology as a science likewise test/ testa genera/ f/e/d depends on the perfection of its measuring instruments. Much of the recent work in the development of tests, particularly in the measurement of personality, is practically worthless because the tests do not tell a consistent story.Reliability in this paper is defined as the correlation between two comparable tests. If a test is split so that one half contains iteniB 1,3, 5, 7, etc., and the other hah* items 2, 4, 6, 8, etc., these two halves 73 1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.