It is not uncommon for randomised trials in education to have the performance of sample members in national examinations as their primary outcome. In many cases examination results are available as summary measures only. Taking the example of GCSE examination results in England, this paper shows that using summary measures of an underlying score or mark, such as exam grade, complicates the design of trials and can lead to under-powered studies. Simple simulations are used to explore the consequences of powering trials to detect a difference assuming grade or summary measures are the only outcome metric available, where the effects of an intervention is primarily captured in the unknown mark or score distribution. The analysis draws on data that relate the entire distribution of marks in English language and mathematics examinations to grades. Recommendations are made in order to address this problem.