Analyzing the effects of multimodal input in the acquisition of second/foreign language (L2) pragmatics is a recent area in research. In this line, the use of eye-tracking to investigate L2 pragmatics remains limited (Godfroid, 2019). This study aimed to explore the effects of multimodal input on L2 requests among English as a Foreign Language (EFL) learners, while monitoring them with a webcam eye-tracker. The study used a multiple-choice discourse completion test at pre and posttest to evaluate the effects of viewing audio-visual material with or without captions. Additionally, a subset of participants was interviewed regarding pragmatic perception. Findings indicate that participants exposed to captioned videos performed better in the posttest and relied on captions when viewing, a result corroborated by retrospective interviews.