We report the validity of a test instrument that assesses the arithmetic ability of primary students by (a) describing the theoretical model of arithmetic ability assessment using Wilson’s (2004) four building blocks of constructing measures and (b) providing empirical evidence for the validation study. The instrument consists of 21 multiple-choice questions that hierarchically evaluate arithmetic intended learning outcomes (ILOs) on arithmetic ability, hierarchically, based on Bloom’s cognitive taxonomy for 138 primary three grade students. The theoretical model describes students’ arithmetic ability on three distinct levels: solid, developing, and basic. At each level, the model describes the characteristics of the tasks that the students can answer correctly. The analysis shows that the difficulty of the items followed the expected order in the theoretical construct map, where the difficulty of each designed item aligned with the cognitive level of the student, the item difficulty distribution aligned with the structure of the person construct map, and word problems required higher cognitive abilities than the calculation problems did. The findings, however, pointed out that more difficult items can be added to better differentiate students with different ability levels, and an item should be revised to enhance the reliability and validity of the research. We conclude that the conceptualizations of such formative assessments provide meaningful information for teachers to support learning and tailoring instruction.