Directing visual attention toward items mentioned within utterances can optimize understanding the unfolding spoken language and preparing appropriate behaviors. In several languages, numeral classifiers specify semantic classes of nouns but can also function as reference trackers. Whereas all classifier types function to single out objects for reference in the real world and may assist attentional guidance, we propose that only sortal classifiers efficiently guide visual attention by being inherently attached to the nouns' semantics, since container classifiers are pragmatically attached to the nouns they classify, and the default classifiers index a noun without specifying the semantics. By contrast, container classifiers are pragmatically attached, and default classifiers index a noun without specifying the semantics. Using eye tracking and the “visual world paradigm”, we had Chinese speakers (N = 20) listen to sentences and we observed that they looked spontaneously within 150 ms after offset of the Sortal classifier. After about 200 ms the same occurred for the container classifiers, but with the default classifier only after about 700 ms. This looking pattern was absent in a control group of non-Chinese speakers and the Chinese speakers' gaze behavior can therefore only be ascribed to classifier semantics and not to artifacts of the visual objects. Thus, we found that classifier types affect the rapidity of spontaneously looking at the target objects on a screen. These significantly different latencies indicate that the stronger the semantic relatedness between a classifier and its noun, the more efficient the deployment of overt attention.