“…In the context of the TPO assessment, the specific implication is of whether the examinee responses are obtained and scored appropriately so that the resulting score accurately represents the quality of the performance. For the specific question of use of automated scoring, there is a rich precedence of studies comparing the results of automated and human scoring as the basis for evaluating the validity of automated scores from the Evaluation perspective (e.g., Bejar, 1991;Bennett, Sebrechts, & Marc, 1994;Braun, Bennett, Frye, & Soloway, 1990;Clauser et al, 1995;Clauser et al, 1997;Kaplan & Bennett, 1994;Page & Petersen, 1995;Sebrechts, Bennett, & Rock, 1991;Williamson et al, 1999). It is generally acknowledged that human scores are not perfect and therefore may not be an ideal basis for evaluation of the quality of automated scoring, suggesting that the term gold standard may be a misnomer.…”