Validity scores for items were obtained by comparing the rank order of the correlations between each item and six criterion measures with the corresponding rank order for the test as a whole. For samples of about 85 cases, the validity pattern for each of two tests showed limited consistency from sample to sample and the item validity scores had low correlations between samples. While sample size contributed to these undependabilities, the additional undependability of item responses themselves raises strong doubts about the practicality of selecting items on the basis of their validity patterns.EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 1976, 36, 631-637. GOLDBERG (1968) has suggested that the ultimate goal of itemmetric research should be the discovery of the relationship between item characteristics and validity. Cooper (1974) attempted such a project. She sought to determine whether the relative validity of personality test items was related to ratings of ambiguity, difficulty of applying the item to oneself, and social desirability.
It was hypothesized that feminists would evaluate male-and femaledefined traits differently than would nonfeminists and that feminists would rate fewer traits as descriptive of only one sex. Undergraduates rated a set of adjectives for appropriateness t o men and women, and for social desirability. Dempewolff's (1973) Feminism II Scale was given to dichotomize the sample into feminists and nonfeminists. The feminists 0361-6843/80/16OO-O186$00.95 0 1980 Human Sciences Press
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.