Monocular scene flow estimation is a task that allows us to obtain 3D structure and 3D motion from consecutive monocular images. Previous monocular scene flow usually focused on the enhancement of image features and motion features directly while neglecting the utilization of motion features and image features in the decoder, which are equally crucial for accurate scene flow estimation. In this paper, we propose a global feature perception module (GFPM) based on cross-covariance attention and applie it to decoder, which enables the decoder to utilize the motion features and image features of the current layer as well as the coarse estimation result of the scene flow of the previous layer effectively, thus enhancing the decoder's recovery of 3D motion information. In addition, we also propose a parallel architecture of self-attention and convolution (PCSA) for feature extraction, which can enhance the global expression ability of extracted image features. Our proposed method demonstrates remarkable performance on the KITTI 2015 dataset, achieving a relative improvement of 17.6\% compared to the baseline approach. Compared to other recent methods, the proposed model achieves competitive results.