Compressing remote sensing images with high spatial and spectral resolution plays an important role in subsequent image processing and information acquisition. Accurate data modeling can help the entropy model to better estimate the entropy value. For better image recovery, it is necessary to make full use of the prior information contained in the latent information. To achieve global association and hierarchical modeling of latent elements, this paper proposes adding additional global anchored-stripe self-attention capturing global, local and interchannel dependencies. To enhance the feature extraction capabilities of the encoder and decoder, the multiscale attention module of depthwise convolution is used to increase the receptive field and nonlinear conversion process, ensuring that the network can retain more useful information. We evaluate the compression performance of the proposed method in terms of rate-distortion curves and running speed. Through comparative experiments on DOTA, LoveDA and UC-Merced datasets, it is that the proposed method has a faster running speed than the context model. It outperforms some traditional compression methods such as BPG, WebP, JPEG2000, and state-of-the-art deep learning-based methods in terms of PSNR and MS-SSIM. In terms of perceptual quality, adding perceptual loss reduces the smooth image blurring due to MSE loss, and the proposed method has better image perceptual quality under the approximate BPP.