One of the most commonly used methods for measuring higher-order thinking skills such as problem-solving or written expression is open-ended items. Three main approaches are used to evaluate responses to open-ended items: general evaluation, rating scales, and rubrics. In order to measure and improve problem-solving skills of students, firstly, an error-free measurement process should be performed. Errors caused by raters such as bias, high or low tendency to score is a common problem in the evaluation of open-ended items as they adversely affect the accuracy of decisions to be made. This study utilized open-ended items to evaluate the raters' tendencies in terms of general evaluation, rating scale, and rubric conditions. The raters' behaviours in each assessment method and their opinions about the assessment methods were determined. The participants of the study consisted of 12 different mathematics teachers and the Many Facet Rasch Model was adopted for the analyses. The scoring reliability of each method was estimated. The findings of the rating scale revealed that the raters had a more homogeneous scoring tendency. In addition, while the majority of raters stated that they prefer to use a rubric, they also stated it is the most difficult method to use.