In language learning, adults seem to be superior in their ability to memorize knowledge of new languages and have better learning strategies, experiences, and intelligence to be able to integrate new knowledge. However, unless one learns pronunciation in childhood, it is almost impossible to reach a native-level accent. In this research, we take the difficulties of learning tonal pronunciation in Mandarin as an example and analyze the difficulties of tone learning and the deficiencies of general learning methods using the cognitive load theory. With the tasks designed commensurate with the learner’s perception ability based on perception experiments and small-step learning, the perception training app is more effective for improving the tone pronunciation ability compared to existing apps with voice analysis function. Furthermore, the learning effect was greatly improved by optimizing the app interface and operation procedures. However, as a result of the combination of pronunciation practice and perception training, pronunciation practice with insufficient feedback could lead to pronunciation errors. Therefore, we also studied pronunciation practice using machine learning and aimed to train the model for the pronunciation task design instead of classification. We used voices designed as training data and trained a model for pronunciation training, and demonstrated that supporting pronunciation practice with machine learning is practicable.