During daily communication, visual cues such as gestures accompany the speech signal and facilitate semantic processing. However, how gestures impact lexical retrieval and semantic prediction, especially in a naturalistic setting, remains unclear. Here, participants watched a naturalistic multimodal narrative, where an actor narrated a story and spontaneously produced co-speech gestures. For all content words, word frequency and lexical surprisal were regressed against the EEG using temporal response functions (TFRs), which were fitted separately, additively, and interactively for words accompanied and not accompanied by gestures. Results from our analyses suggest a robust modulation effect of gesture on the frequency-dependent regression N400. Besides, we also observed some evidence of modulative effect of gesture on the surprisal-N400 effect based on the single-predictor model.Our finding thus suggests that, on a neural level, the presence of co-speech gestures facilitates lexical retrieval and potentially semantic prediction during the processing of naturalistic multimodal stimuli.