Audiovisual input has received increasing attention from the Second Language Acquisition (SLA) and the Computer-Assisted Language Learning (CALL) domains during the past few decades due to its vividness, authenticity, and easy accessibility. Videos with on-screen texts, as a widespread way of audiovisual input in second language (L2) teaching and learning, influence L2 learners’ performance in various aspects, including their vocabulary learning. The wide application and profound influence of such kind of input call for a systemic review on this important domain of research. Accordingly, this paper reviews the empirical studies on the effects of on-screen texts on L2 vocabulary learning. Specifically, it seeks to evaluate the role of different types of on-screen texts (i.e., subtitles, captions, and dual subtitles) and various modes of captions (i.e., full captions, keyword captions, glossed captions, annotated captions, and enhanced captions) in L2 vocabulary development. It also discusses other factors that concur with on-screen texts and influence L2 vocabulary gains from audiovisual input, such as learners’ vocabulary size, L2 proficiency, frequency of occurrence, number of viewing, instructional strategy, and test time. Finally, some suggestions are provided for future research.