This paper discusses optimizing desktop image quality and bandwidth consumption in remote IoT GUI desktop scenarios. Remote desktop tools, which are crucial for work efficiency, typically employ image compression techniques to manage bandwidth. Although JPEG is widely used for its efficiency in eliminating redundancy, it can introduce quality loss with increased compression. Recently, deep learning-based compression techniques have emerged, challenging traditional methods like JPEG. This study introduces an optimized RFB (Remote Frame Buffer) protocol based on a convolutional neural network (CNN) image compression algorithm, focusing on human visual perception in desktop image processing. The improved RFB protocol proposed in this paper, compared to the unoptimized RFB protocol, can save 30–80% of bandwidth consumption and enhances remote desktop image quality, as evidenced by improved PSNR and MS-SSIM values between the remote desktop image and the original image, thus providing superior desktop image transmission quality.