In this study, multimodal interaction analysis is used to analyze the apology act of two Chinese EFL learners and two Bangladeshi EFL learners in face-to-face interaction through task-role-playing via the multimodal software Elan. The radar figures are also drawn to show the similarities and differences of the modal density of six different modal resources: gesture, head movement, gaze, facial expression, posture, and utterances. It is found that the intensity of the gaze modality used by the four participants is higher with the gaze modality and utterance modality occupying the central position in the sequence of the apology act. Chinese learners of English and Bangladeshi learners of English achieve the highest values as regards the intensity of each mode in explanatory strategy and repair strategy, respectively. It indicates that they attach importance to different apology strategies. Chinese EFL learners, by contrast, have low modal complexity, suggesting that they do not engage in complex actions, but still use verbal and nonverbal modes together to build the ongoing meaning of conversations. As is indicated, pragmatic competence is the ability of language users to communicate properly in social interaction. And, communication needs different modes to coordinate, produce resultant force and play a role. Meanwhile, the application of multi-modal analysis to the speech act of apology is a new paradigm to re-examine the classic study of pragmatic competence, in which the construction and negotiation of utterance meaning can be revealed, to a greater extent, more clearly and completely.