BackgroundThe Objective Structured Clinical Examination (OSCE) is increasingly used at medical schools to assess practical competencies. To compare the outcomes of students at different medical schools, we introduced standardized OSCE stations with identical checklists.MethodsWe investigated examiner bias at standardized OSCE stations for knee- and shoulder-joint examinations, which were implemented into the surgical OSCE at five different medical schools. The checklists for the assessment consisted of part A for knowledge and performance of the skill and part B for communication and interaction with the patient. At each medical faculty, one reference examiner also scored independently to the local examiner. The scores from both examiners were compared and analysed for inter-rater reliability and correlation with the level of clinical experience. Possible gender bias was also evaluated.ResultsIn part A of the checklist, local examiners graded students higher compared to the reference examiner; in part B of the checklist, there was no trend to the findings. The inter-rater reliability was weak, and the scoring correlated only weakly with the examiner’s level of experience. Female examiners rated generally higher, but male examiners scored significantly higher if the examinee was female.ConclusionsThese findings of examiner effects, even in standardized situations, may influence outcome even when students perform equally well. Examiners need to be made aware of these biases prior to examining.Electronic supplementary materialThe online version of this article (doi:10.1186/s12909-017-0908-1) contains supplementary material, which is available to authorized users.