Cognitive tasks are seldom evaluated on their ability to provide valid and reliable measurements of the construct they intend to measure. This scarcity of psychometric evaluations makes it challenging to evaluate replications of experimental effects and to relate performance in cognitive tasks to other constructs of interest. In developmental science, these issues are compounded by the often‐imprecise measures derived from tasks completed by child participants. Here, we focus on the spatial arrangement method when used to assess semantic structure in children and evaluate its psychometric properties. Using a new analytic approach to capture individual variability in participants' arrangement in this task, we show that the spatial arrangement method has appropriate construct validity (β = 0.40), internal consistency (r2 = 0.20), and test–retest reliability (r2 = 0.41; ICC = 0.56) when used to evaluate semantic structure in U.S. children (4–9 years of age; N = 200 across 4 datasets). We discuss the implications of these findings for examining semantic structure in children and for strengthening methodological practices in developmental science more broadly.