This study aims to improve the quality of multi-view video, and designs a new method by integrating Convolutional neural network (CNN), visual saliency detection and image enhancement theory. The experimental results show that the proposed visual saliency detection model and convolution filter sensor have made remarkable progress. The superiority of the visual saliency detection model is helpful to accurately locate the key features of the image and provide accurate targets for subsequent enhancement processing. The convolution filter sensor improves the peak value of the image, narrows the gap with the original image and improves the visual effect. Supplementary experiments further verify the effectiveness of the method. Through the quantitative comparison between SSIM and MS-SSIM, the method is obviously superior to the existing methods on several data sets, showing a robust video quality enhancement effect. These results highlight the superiority and robustness of the method, and bring strong empirical support to the field of multi-view video quality enhancement, which is expected to have an important impact in practical applications.