“…In addition, although some studies have found that structured observation protocols can be applied with adequate reliability, others have raised questions about the consistency of ratings and the validity of scores obtained from these tools for different purposes (Harris and Sass, 2007;Henry, Murray, and Phillips, 2007;HRI, 2000;Jacob and Lefgren, 2008;Kane, Kerr, and Pianta, 2014;Learning Mathematics for Teaching Project, 2011;Medley and Coker, 1987;Piburn and Sawada, 2000;Schultz and Pecheone, 2014;Walkington and Marder, 2014). Rater error is a major threat to the reliability and validity of classroom observation scores and principal ratings, despite great effort to train raters and calibrate their scoring (Casabianca, Lockwood, and McCaffrey, 2015;Myford, 2012;Whitehurst, Chingos, and Lindquist, 2014). Using Classroom Assessment Scoring System data from the recent Measures of Effective Teaching project, Drew Gitomer and his colleagues found that observers had difficulty agreeing on ratings of certain dimensions of teaching, particularly those for which teacher performance was generally relatively weak (Gitomer et al, 2014).…”