Eye tracking technology is increasingly used to understand individuals’ non-conscious, moment-to-moment processes during video-based learning. This review evaluated 44 eye tracking studies on video-based learning conducted between 2010 and 2021. Specifically, the review sought to uncover how the utilisation of eye tracking technology has advanced understandings of the mechanisms underlying effective video-based learning and what type of caution should be exercised when interpreting the findings of these studies. Four important findings emerged from the analysis: (1) not all the studies explained the mechanisms underlying effective video-based learning through employing eye tracking technology, and few studies disentangled the complex relationship between eye tracking metrics and cognitive activities these metrics represent; (2) emotional factors potentially serve to explain the processes that facilitate video-based learning, but few studies captured learners’ emotional processes or evaluated their affective gains; (3) ecological validity should be improved for eye tracking research on video-based learning through methods such as using eye tracking systems that have high tolerance for head movements, allowing learners to take control of the pacing of the video, and communicating the learning objectives of the video to participants; and (4) boundary conditions, including personal (e.g. age, prior knowledge) and environmental factors (e.g. the topic of videos, type of knowledge), must be considered when interpreting research findings. The findings of this review inspire a number of propositions for designing and interpreting eye tracking research on video-based learning.