Are performance appraisal ratings from different rating sources comparable?

Facteau, Jeffrey D.; Craig, S. Bartholomew

doi:10.1037/0021-9010.86.2.215

Cited by 138 publications

(158 citation statements)

References 44 publications

Supporting

Mentioning

139

Contrasting

Unclassified

Order By: Relevance

“…Unfortunately, there is consistent evidence that raters do not agree in their evaluations of ratees. Disagreement is probably more substantial when raters differ in terms of their relationship with the persons to be rated (e.g., supervisors vs. peers), but even when raters are at the same level in an organization, they often do not agree in their evaluations (Facteau & Craig, 2001;Harris & Schaubroeck, 1988;Heneman, 1974;Murphy, Cleveland, & Mohler, 2001;Viswesvaran, Ones, & Schmidt, 1996). There have been intense disagreements about precisely how disagreements among raters should be interpreted and about the more general implications of these disagreements for the psychometric quality of ratings (e.g., Murphy & DeShon, 2000;Ones, Viswesvaran, & Schmidt, 2008), but there is little controversy about the fact that raters do not show the level of agreement one might expect from, for example, two different forms of the same paper-and-pencil test.…”

Section: Disagreement Among Ratersmentioning

confidence: 99%

Getting Rid of Performance Ratings: Genius or Folly? A Debate

Adler¹,

Campion²,

Colquitt

et al. 2016

Ind. Organ. Psychol.

172

199

View full text Add to dashboard Cite

Section: Disagreement Among Ratersmentioning

confidence: 99%

Getting Rid of Performance Ratings: Genius or Folly? A Debate

Adler¹,

Campion²,

Colquitt

et al. 2016

Ind. Organ. Psychol.

172

199

View full text Add to dashboard Cite

“…The Black and White IRT-based IRFs are quite similar in Figure 9, whereas the CFAbased IRFs in Figure 4 are substantially different for the Black and White groups. Facteau and Craig (2001), Maurer, Raju, andCollins (1998), andLaffitte, Raju, Scott, andFasolo (1998) also reported some measurement nonequivalence results that were not consistent across the two perspectives.…”

Section: Irt Perspectivementioning

confidence: 99%

Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory.

Raju¹,

Laffitte²,

Byrne³

2002

Journal of Applied Psychology

346

312

View full text Add to dashboard Cite

Current interest in the assessment of measurement equivalence emphasizes 2 major methods of analysis. The authors offer a comparison of a linear method (confirmatory factor analysis) and a nonlinear method (differential item and test functioning using item response theory) with an emphasis on their methodological similarities and differences. The 2 approaches test for the equality of true scores (or expected raw scores) across 2 populations when the latent (or factor) score is held constant. Both approaches can provide information about when measurement nonequivalence exists and the extent to which it is a problem. An empirical example is used to illustrate the 2 approaches.

show abstract

“…Facteau and Craig (2001), using confirmatory factor analysis (CFA) and item response theory (IRT) with a large dataset of 360°feedback ratings, found construct equivalence between peers, supervisors, and subordinates. Maurer, Raju, and Collins (1998), employing both CFA and IRT, found evidence of construct-level agreements between peer and subordinate ratings of performance.…”

Section: Construct-level Disagreements and Rater Reliability In Supermentioning

confidence: 99%

Comparing Rater Groups: How To Disentangle Rating Reliability From Construct-Level Disagreements

Viswesvaran

Ones

Schmidt

2016

Ind. Organ. Psychol.

View full text Add to dashboard Cite

In this commentary, we build on Bracken, Rose, and Church's (2016) definition stating that 360°feedback should involve "the analysis of meaningful comparisons of rater perceptions across multiple ratees, between specific groups of raters" (p. 764). Bracken et al. expand on this component of the definition later by stressing that "the ability to conduct meaningful comparisons of rater perceptions both between (inter) and within (intra) groups is central and, indeed, unique to any true 360°feedback process" (p. 767; italicized in their focal article). Bracken et al. stress that "This element of our definition acknowledges that 360°feedback data represent rater perceptions that may contradict each other while each being true and valid observations" (p. 767).Bracken et al. (p. 768) present six questions, three of which stress intergroup comparisons: Question 2, which reads, "Is the feedback process conducted in a way that formally segments raters into clearly defined and meaningful groups?"; Question 4, which reads, "Is the feedback collected . . . to establish reliability, which can vary by rater group?" [emphasis added]); and Question 5, which reads, does "the feedback process . . . provide the user with sufficiently clear and reliable [emphasis added] insights into interand intragroup perceptions?" The original definition, as well as the three questions, clearly emphasizes the need for delineating distinct groups of raters. Finally, in discussing how we can facilitate evolution of 360°feedback, Bracken et al. call for a more accurate description of how group membership is operationalized.

show abstract

Are performance appraisal ratings from different rating sources comparable?

Cited by 138 publications

References 44 publications

Getting Rid of Performance Ratings: Genius or Folly? A Debate

Getting Rid of Performance Ratings: Genius or Folly? A Debate

Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory.

Comparing Rater Groups: How To Disentangle Rating Reliability From Construct-Level Disagreements

Contact Info

Product

Resources

About