BackgroundVocabulary learning in a second language (L2) encompasses crucial aspects, including single words and collocations. Research indicates that L2 learners can incidentally learn single words from captioned videos, but less is known about the incidental learning outcomes of collocations, let alone the differences in learning gains for single words and collocations under different captioned conditions, as well as individual differences that may account for such differences.ObjectivesThis study aimed to fill this gap by comparing the learning gains of single words and collocations while investigating the influence of vocabulary knowledge (VK) and working memory (WM) on the learning results within diverse forms of captioning conditions: full captions, keyword captions, and no captions.MethodsThe study involved 129 young Chinese ESL learners who completed vocabulary tests assessing their meaning recall before, immediately after, and 2 weeks after the study, as well as tests for VK and WM.Results and ConclusionsThe results showed that full captions are the most efficacious condition for enhancing both single word and collocation learning. The depth of VK, as well as phonological and complex WM, were significant factors in the learning of new language items.TakeawaysDifferent types of captioning (full or keyword) contribute differently to the learning of various language items. Individual differences in WM and depth of VK among learners should be considered when utilizing captioned videos for language learning.