This paper builds and evaluates a self-learning system for spoken English pronunciation suitable for PC mobile terminal or mobile terminal users, and the core technology of speech recognition is built on Hidden Markov Model (HMM), which is used to decode the speech signal in spoken English learning. This paper studies the related speech recognition theory and signal processing technology, builds a comprehensive English self-learning system in more complex situations and more user types and conducts a comprehensive evaluation of the system. The results show that the overall accuracy of the HMM model in the spoken English recognition and evaluation system built in this paper is good, and the accuracy of the input audio for people of all ages is greater than 90%. In the younger population, the accuracy of male speech signals was the highest in both closed space and open space, reaching 98.12% and 96.53%, respectively. In addition, it is observed that the accuracy gradually decreases as more wrong judgements are made on the speech input signal. When the evaluation is poor, the accuracy of the scoring results decreases to 55%, and when there are fewer voice judgement errors, it is in the excellent range, where the accuracy reached 88%.