John E. Kurtz scite author profile

We examined data (N = 34,108) on the differential reliability and validity of facet scales from the NEO Inventories. We evaluated the extent to which (a) psychometric properties of facet scales are generalizable across ages, cultures, and methods of measurement; and (b) validity criteria are associated with different forms of reliability. Composite estimates of facet scale stability, heritability, and cross-observer validity were broadly generalizable. Two estimates of retest reliability were independent predictors of the three validity criteria; none of three estimates of internal consistency was. Available evidence suggests the same pattern of results for other personality inventories. Internal consistency of scales can be useful as a check on data quality, but appears to be of limited utility for evaluating the potential validity of developed scales, and it should not be used as a substitute for retest reliability. Further research on the nature and determinants of retest reliability is needed. KeywordsReliability; validity; cross-national; Five-Factor Model; personality traits Scale reliability is commonly said to limit validity (John & Soto, 2007); in principle, more reliable scales should yield more valid assessments (although of course reliability is not sufficient to guarantee validity). For a given set of scales, such as the 30 facets of the NEO Inventories (McCrae & Costa, in press), there is differential reliability: Some facets are more reliable than others. That fact makes it possible to test the maxim that reliability limits validity, provided that criteria of validity are chosen that are comparable across all facet scales: More reliable facets ought to be more valid. We will argue that three relevant criteria are longitudinal stability, heritability, and cross-observer agreement. Each of the 30 NEO facets is known to be more or less stable (Costa, Herbst, McCrae, & Siegler, 2000) and heritable (Jang, McCrae, Angleitner, Riemann, & Livesley, 1998), and to show evidence of some degree of cross-observer agreement (McCrae et al., 2004); however, other things being equal, more reliable facets should be more stable and heritable, and show stronger evidence NIH-PA Author ManuscriptNIH-PA Author Manuscript NIH-PA Author Manuscript of consensual validity. There are, however, different forms of reliability, of which internal consistency and retest reliability are the most prominent. In the present article we (a) assemble evidence on the stability, heritability, and cross-observer validity of NEO facets from the published literature; and (b) predict these values from estimates of internal consistency and retest reliability. These analyses allow us to assess the relative importance of these two forms of reliability.In this article we construe validity broadly to refer to the quality of a scale as a measure of its intended construct. However, our discussion is limited to a consideration of convergent validity; readers should recall that discriminant validity is also an essential attribute of a good sc...

show abstract

Relationship Quality, Trait Similarity, and Self‐Other Agreement on Personality Ratings in College Roommates

Kurtz

Sherker

2003

Journal of Personality

159

View full text Add to dashboard Cite

Previous research has shown that the level of self-other agreement for personality trait ratings increases with the length of acquaintanceship between the target and the informant. These findings emerge exclusively from studies of well-acquainted pairs in natural relationships and relative strangers interacting in laboratory and classroom settings. The present study examines self-other correlations for trait ratings using the NEO Five Factor Inventory (NEO-FFI; Costa & McCrae, 1992) with 103 pairs of previously unacquainted female college roommates. Assessments were obtained at approximately 2 weeks and again at approximately 15 weeks subsequent to the roommates' initial introduction. Self-other correlations increased for all five NEO-FFI scores and agreement correlations for Conscientiousness were significantly higher than for Extraversion at both occasions. Differences in relationship quality did not moderate self-other agreement for any of the traits. However, better relationship quality was associated with higher other-ratings of Extraversion, Agreeableness, and Conscientiousness and lower other-ratings of Neuroticism after controlling for self-ratings on the same trait. Higher similarity in self-ratings of Neuroticism and Openness was associated with higher self-other agreement for these ratings, and similarity in Conscientiousness was associated with higher relationship quality. These results are considered in light of existing theories of differential trait observability and the effects of unique contexts on trait perception.

show abstract

Advancing Personality Assessment Terminology: Time to Retire "Objective" and "Projective" As Personality Test Descriptors

Meyer¹,

Kurtz²

2006

Journal of Personality Assessment

117

View full text Add to dashboard Cite

Semantic Response Consistency and Protocol Validity in Structured Personality Assessment: The Case of the NEO-PI-R

Kurtz¹,

Parrish²

2001

Journal of Personality Assessment

View full text Add to dashboard Cite

In this study we tested the hypothesis that groups of NEO Personality Inventory-Revised (NEO-PI-R; Costa & McCrae, 1992a) protocols identified as potentially invalid by an inconsistency scale (INC; Schinka, Kinder, & Kremer, 1997) would show reduced reliability and validity according to a series of psychometric tests. Data were obtained from 2 undergraduate student samples, a self-report group (n = 132) who provided NEO-PI-R self-ratings on 2 occasions separated by a 7- to 14-day interval and an informant group (n = 109) who provided ratings of well-known friends or relatives on 2 occasions separated by a 6 month interval. INC scores from the Time 1 protocols were used to divide these samples into low, moderate, and elevated inconsistency groups. In both samples, these 3 groups showed equivalent levels of reliability and validity as measured by: contingency coefficients for the 20 INC item responses across occasions; test-retest intraclass correlations of NEO-PI-R domain scores; convergent correlations with Goldberg's (1992) Bipolar Adjective Scale scores; and discriminant correlations between the 5 NEO-PI-R domain scores. The similarity of results across self-report and informant assessment contexts provides additional evidence that semantic consistency approaches to assessing protocol validity may overestimate the prevalence of random or careless response behavior in standard administration conditions. Several theories are discussed that accommodate the existence of valid inconsistency in structured personality assessment.

show abstract

Socially desirable responding in personality assessment: Still more substance than style

Kurtz

Tarquini

Iobst

2008

Personality and Individual Differences

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

John E. Kurtz

Internal Consistency, Retest Reliability, and Their Implications for Personality Scale Validity

Relationship Quality, Trait Similarity, and Self‐Other Agreement on Personality Ratings in College Roommates

Advancing Personality Assessment Terminology: Time to Retire "Objective" and "Projective" As Personality Test Descriptors

Semantic Response Consistency and Protocol Validity in Structured Personality Assessment: The Case of the NEO-PI-R

Socially desirable responding in personality assessment: Still more substance than style

Contact Info

Product

Resources

About