Over the last decades, cognitive neuroscience has identified a distributed set of brain regions that are critical for attention - one of the key principles of adaptive behavior. A strong anatomical overlap with brain regions critical for oculomotor processes suggests a joint network for attention and eye movements. However, the role of this shared network in complex, naturalistic environments remains understudied. Here, we investigated eye movements in relation to (un)attended sentences of natural speech in simultaneously recorded eye tracking and magnetoencephalographic (MEG) data. Using temporal response functions (TRF), we show that eye gaze tracks acoustic features (envelope and acoustic onsets) of attended speech, a phenomenon we termed ocular speech tracking. Ocular speech envelope tracking even differentiates a target from a distractor in a multi speaker context and is further related to intelligibility. Moreover, we provide evidence for its contribution to neural differences in speech processing, emphasizing the necessity to consider oculomotor activity in future research and in the interpretation of neural differences in auditory cognition. Our results extend previous findings of a joint network of attention and eye movement control as well as motor theories of speech. They provide valuable new directions for research into the neurobiological mechanisms of the phenomenon, its dependence on learning and plasticity, and its functional implications in social communication.