Gestures are hand movements that are produced simultaneously with spoken language and can supplement it by representing semantic information, emphasizing important points, or showing spatial locations and relations. Gestures’ specific features make them a promising tool to improve spatial thinking. Yet, there is recent work showing that not all learners benefit equally from gesture instruction and that this may be driven, in part, by children’s difficulty understanding what an instructor’s gesture is intended to represent. The current study directly compares instruction with gestures to instruction with plastic unit chips (Action) in a linear measurement learning paradigm aimed at teaching children the concept of spatial units. Some children performed only one type of movement, and some children performed both: Action-then-Gesture [AG] or Gesture-then-Action [GA]. Children learned most from the Gesture-then-Action [GA] and Action only [A] training conditions. After controlling for initial differences in learning, the gesture-then-action condition outperformed all three other training conditions on a transfer task. While gesture is cognitively challenging for some learners, that challenge may be desirable—immediately following gesture with a concrete representation to clarify that gesture’s meaning is an especially effective way to unlock the power of this spatial tool and lead to deep, generalizable learning.