“…Besides, studies comparing methods used for determining inter-rater reliability based on different theories of measuring have an important place in recent research. These include studies comparing the methods based on the classical test theory, G theory, the many-facet Rasch measurement and the hierarchical rating model (Akın & Baştürk, 2010, 2012Engelhard, 1994;Engelhard & Myford, 2003;Güler & Gelbal, 2010b;Güler & Teker, 2015;Iramaneerart, Myford, Yudkowsky, & Lowenstein, 2009;Iramaneerat et al, 2008;Linacre et al, 1990;Lynch & McNamara, 1998;Macmillan, 2000;Nakamura, 2000;Stenlund, 2013;Sudweeks, Reeve, & Bradshaw, 2004). Further details are not provided in relation to the above mentioned studies since the present study aims at comparing rubrics and graded-category rating scales used in scoring rather than comparing the methods used to determine inter-rater reliability.…”