At present, most students in colleges and universities still have different degrees of difficulty in English oral expression. This study tries to apply deep coding attention to the teaching of spoken English in colleges and universities, taking into account the characteristics of tight time, long cycles, and challenging training in spoken English teaching. By proposing the CIASC-BiLSTM model that integrates the deep coding attention mechanism using a cross-fertilization layer, the attention mechanisms of intention recognition and semantic slot filling introduce explicit associative supplementary information to each other. The ASPAN dual attention mechanism model is employed to learn adversarial training from two public benchmark corpora. To verify the effectiveness of the innovative teaching model of spoken English in colleges and universities based on deep encoding attention, this paper explores the changes in students’ spoken English proficiency through teaching experiments. The results show that the significant difference value between the experimental class and the control class in the posttest of speaking performance is 0.026<0.05, and the experimental class has a substantial improvement over the control class. The results of the pre and post-test t-tests of speaking ability in each dimension of the experimental class are less than 0.05, and the innovative teaching of speaking based on deep coding attention has achieved good results.