Even though rater-based judgements of clinical competence are widely used, they are context sensitive and vary between individuals and institutions. To deal adequately with rater-judgement unreliability, evaluating the reliability of workplace rater-based assessments in the local context is essential. Using such an approach, the primary intention of this study was to identify the trainee score variation around supervisor ratings, identify sampling number needs of workplace assessments for certification of competence and position the findings within the known literature. This reliability study of workplace-based supervisors' assessments of trainees has a rater-nested-within-trainee design. Score variation attributable to the trainee for each competency item assessed (variance component) were estimated by the minimum-norm quadratic unbiased estimator. Score variance was used to estimate the number needed for a reliability value of 0.80. The trainee score variance for each of 14 competency items varied between 2.3% for emergency skills to 35.6% for communication skills, with an average for all competency items of 20.3%; the "Overall rating" competency item trainee variance was 28.8%. These variance components translated into 169, 7, 17 and 28 assessments needed for a reliability of 0.80, respectively. Most variation in assessment scores was due to measurement error, ranging from 97.7% for emergency skills to 63.4% for communication skills. Similar results have been demonstrated in previously published studies. In summary, overall supervisors' workplace based assessments have poor reliability and are not suitable for use in certification processes in their current form. The marked variation in the supervisors' reliability in assessing different competencies indicates that supervisors may be able to assess some with acceptable reproducibility; in this case communication and possibly overall competence. However, any continued use of this format for assessment of trainee competencies necessitates the identification of what supervisors in different institutions can reliably assess rather than continuing to impose false expectations from unreliable assessments.