In the era of eLearning 4.0, many researchers have suggested that multimodal input helps to enhance second language (L2) vocabulary learning. However, previous studies on the effects of multimodal teaching have failed to yield definitive conclusions. Furthermore, only few studies on the multimodal input of vocabulary learning have aimed at junior high school students and have focused on explicit vocabulary instruction in class. To explore the effects of multimodal input on English as a foreign language (EFL) learners’ vocabulary learning and summarize effective methods, this study adopts a mixed-method approach. Based on dual coding theory and cognitive load theory, the teaching materials in this study were designed using the resources provided by the multimodal corpus iWeb and other websites. A total of 60 junior high school students who learned EFL and had a similar English proficiency level were divided into an experimental group (EG) and a control group (CG). Target words were selected through questionnaire I. During the experiment, the CG learned from monomodal materials while the EG received multimodal input, and an immediate post-test was delivered to the two groups. Questionnaire II was distributed in the EG, and five students of the EG were randomly selected for an interview. One week later, a delayed post-test was conducted on the EG and CG. The results showed that the EG performed better in the post-test but did worse than the CG in the delayed post-test. The results of the questionnaire and the interview suggest that students held both positive and negative attitudes toward the multimodal input approach in vocabulary learning. The study concludes with some implications for choosing a multimodal input approach in vocabulary learning, along with a number of suggestions on how to optimize its positive influence and minimize its negative effects.