This study aimed at investigating the influence of learners’ previously acquired capitals on their response to the teacher’s attempts to trigger their attention. For this purpose, 403 EFL learners completed three different questionnaires. The results of Structural Equation Modeling showed that emotional capital was a positive predictor of interpersonal, intrapersonal, visual, musical, and kinesthetic joint attention styles. Similarly, social capital was a positive predictor of interpersonal, verbal, visual, and musical styles, and cultural capital was a positive predictor of logical and verbal styles. Moreover, the proposed model of L2 achievement based on the capitals and joint attention styles showed good fit to the data. It appears that learners’ socio-cultural and emotional backgrounds influence their response to the teacher’s initiation of joint attention. And their L2 achievement is enhanced when the teacher uses different joint attention modalities. In the end, pedagogical implications and areas for further research were provided.