This meta-analysis reviewed the magnitude and moderators of the relationship between rater liking and performance ratings. The results revealed substantial overlap between rater liking and performance ratings (ρ = .77). Although this relationship is often interpreted as indicative of bias, we review studies that indicate that to some extent the relationship between liking and performance ratings potentially reflects "true" differences in ratee performance. Moderator analyses indicated that the relationship between liking and performance ratings was weaker for ratings of organizational citizenship behaviors, ratings made by peer raters, ratings in nonsales jobs, and ratings made for development; however, the relationship was strong across moderator levels, underscoring the robustness of this relationship. Implications for the interpretation of performance ratings are discussed.Performance evaluation systems are central to a cross-section of talent management functions, such as determining employee compensation and rewards, providing developmental feedback, documenting administrative decisions, succession planning, and reinforcing organizational norms (Cascio & Aguinis, 2005). In fact, Ghorpade and Chen (1995) suggested that performance ratings are "inevitable in all organizations-large and small, public and private, local and multinational" (p. 32). Yet performance appraisals have been the subject of substantial criticism over the years. Indeed, skepticism as to the quality of the information obtained from human evaluations has persisted for nearly as long as the field of psychological measurement (Thorndike, 1925;Wells, 1907). Murphy (2008) succinctly summed up the state of affairs, noting "performance ratings are widely viewed as poor measures of job performance" (p. 148).Over the years, a litany of factors has been proposed to hinder the quality of performance ratings. The overarching theme of this school of thought is that raters introduce performance irrelevant variance into performance ratings because they are either unable or unwilling to provide accurate ratings. Early research attributed low-quality ratings to rater ability (or presumably, lack thereof) and sought to design better scales