School mathematics examination papers are typically dominated by short, structured items that fail to assess sustained reasoning or problem solving. A contributory factor to this situation is the need for student work to be marked reliably by a large number of markers of varied experience and competence. We report a study that tested an alternative approach to assessment, called comparative judgement, which may represent a superior method for assessing open-ended questions that encourage a range of unpredictable responses. An innovative problem solving examination paper was specially designed by examiners, evaluated by mathematics teachers, and administered to 750 secondary school students of varied mathematical achievement. The students' work was then assessed by mathematics education experts using comparative judgement as well as a specially designed, resourceintensive marking procedure. We report two main findings from the research. First, the examination paper writers, when freed from the traditional constraint of producing a mark scheme, designed questions that were less structured and more problem-based than is typical in current school mathematics examination papers. Second, the comparative judgement approach to assessing the student work proved successful by our measures of inter-rater reliability and validity. These findings open new avenues for how school mathematics, and indeed other areas of the curriculum, might be assessed in the future.Keywords: assessment, problem solving, validity, comparative judgement 3 Typical mathematics exams are not fit for the purpose of assessing students' mathematical knowledge and skills. Analyses of the content and style of examination papers support the conjecture that mathematics examination papers comprise mainly short items that assess the rote learning of isolated facts and procedures (Berube, 2004; NCETM, 2009; Noyes, Wake, Drake & Murphy, 2011). An example question from a recent General Certificate for Secondary Education (GCSE) examination paper, a national qualification in England taken by most school leavers, illustrates the problem, as shown in Figure 1. ***FIGURE 1 ABOUT HERE*** At first glance, the question looks promising for assessing students' mathematical knowledge and skills. It makes use of a calendar context thereby appealing to the everyday relevance of mathematics. It also builds on students' experience of a counting system grouped in 7s to introduce an interesting generality that wherever the 2 by 2 square is positioned the provided algorithm will always give 7. However, to achieve full marks, all a student needs to do is compute the provided algorithm using the provided inputs. No explanation or proof of why the result is always 7 is required or rewarded. An efficient examination taker can achieve full marks without noticing there is a mathematically interesting generality at all.The question might be improved by asking students to compute the algorithm for a few 2 by 2 squares of their own choosing, and then asking them to explain what they ...