With the current exchange and communication between different countries becoming more and more frequent, the language conversion of different countries has become a difficult problem. The analysis of a series of problems in cross-language discourse conversion, the study of the discourse conversion path, and innovation motivation based on the deep learning theory of cross-language transfer, it has theoretical and practical significance. This paper aims at the technical difficulties in speech conversion methods to effectively utilize the local mode information of signal time spectrum and the long-term correlation of speech signal. A discourse conversion method based on convolutional recurrent neural network model is proposed. In the model, the extended convolutional neural network is used to model the long-term correlation of speech signals. In the part of speech fundamental frequency estimation, the prosodic information generated by the decomposition of the fundamental frequency by continuous wavelet transform is used as the training target of the fundamental frequency estimation model. The experimental results show that the speech transformation method based on the convolutional cyclic network model proposed in this paper has better quality and intelligibility than the speech transformed by the contrast method.