Learning a language poses a significant challenge for learners, who must comprehend the intricacies of language and analyze the relationships among its components. While computer technologies have been developed to assist language learning, they fail to mirror the human cognitive process. This paper examines the application of a multimodal dialogue system to enhance language learning outcomes. The system boasts several advantages. Firstly, smart devices can collect multimodal data in learning environments to monitor the learner's status in realtime, thus enhancing the accuracy of intention recognition. Secondly, the system can interact with learners naturally by analyzing their multimodal data, resulting in improved language skills. Finally, application scenarios are designed based on the defined multimodal dialogue system, which effectively demonstrates the system's ability to enhance language learning performance.