In the present study, comparability of scores from student evaluation of teaching forms was investigated. This is an important issue because scores given by students are used in decision making in higher education institutions. Three course-related variables (grade level, course type, and course credit) were used to define student subgroups. Then, multi-group confirmatory factor analysis was used to assess invariance of factorial structure, factor loadings and factor means across groups. It was found that although a common factorial structure held across groups, fully invariant factor loadings were observed only across instructors who teach different course types. For other groups, only partial invariance of factor loadings was obtained. Analyses also revealed that none of the subgroups had invariant factor means, indicating a possible bias. Results indicate that comparison of instructors based on student ratings may not be valid as it is mostly assumed.