Introduction The graded Wolf Motor Function Test assesses upper limb function following stroke. Clinical utility is limited by the requirement to video record for scoring purposes. This study aimed to (a) assess whether video recording is required through examination of inter-rater reliability and agreement; and (b) assess intra-rater reliability and agreement. Method A convenience sample of 30 individuals were recruited following stroke. The graded Wolf Motor Function Test was administered within 2 weeks of rehabilitation commencement and at 3 months. Two occupational therapists scored participants through either direct observation or video. Inter- and intra-rater reliability and agreement were examined for item-level and summary scores. Results Excellent inter-rater reliability ( n = 28) was found between scoring through direct observation and by video (intraclass correlation coefficients >0.9), and excellent intra-rater reliability ( n = 21) was found (intraclass correlation coefficients >0.9) for item-level and summary scores. Low agreement was found between raters at the item level. Adequate agreement was found for total functional ability, with increased measurement error found for total performance time. Conclusion The graded Wolf Motor Function Test is a reliable measure of upper limb function. Video recording may not be required by therapists. In view of low agreement, future studies should assess the impact of standardised training.