Aiming at the problem that traditional image rendering methods are time-consuming and complex, and cannot meet the application scenarios of modern design. In this paper, the depth self-encoder and capsule network in artificial intelligence (AI) technology are used to study the automatic rendering of images. Firstly, Scharr filter is used to reconstruct the image in Stack Capsule Autoencoder (SCAE) model to enhance the accuracy of image target detection and reduce the loss of reconstructed image. Then the loss function of the model is improved. Finally, in the Modified National Institute of Standards and Technology (MNIST) data set, Canadian Institute for Advanced Research-10 (CIFAR-10) data set, the performance of the improved model is tested on the Canadian Institute for Advanced Research-100 (CIFAR-100) data set and the data set made by the author. The test results show that the accuracy of the improved model is higher than that of the traditional K-means, Autoencoder network (AE) unsupervised algorithm and the improved depth clustering unsupervised model. The stacked capsule self-encoder using Scharr filter can make the classification effect of the model more accurate. The improved algorithm in this paper provides some references for improving the efficiency of image rendering in virtual environment.