“…This question was addressed by analyzing holistic scores assigned to the oral task responses of 30 test takers by three raters using the rating scale. Interrater reliability of the scores was estimated using a two-way mixed average measure intraclass correlation coefficient (ICC) for absolute agreement (Shrout & Fleiss, 1979;McGraw & Wong, 1996), as in other studies on L2 oral assessment (e.g., Lee, 2015;Park, 2015). Among different types of ICC, this type was chosen for the study because (a) the same set of raters rated all task responses, (b) raters involved in the analysis were the only raters of interest (i.e., not randomly selected from a large population), (c) the mean of ratings by multiple raters is used as the basis of the assessment, and (d) the absolute agreement among raters is of importance (Shrout & Fleiss, 1979;McGraw & Wong, 1996).…”