With the development of computer vision technology, the demand for accurate recognition of stereoscopic image quality in the market is increasing. Accurate recognition of stereoscopic image quality is of great significance for providing high-value intelligent image services, which is also the motivation for this study. Accurately recognizing the quality of stereoscopic images is of great significance for image analysis and computer vision applications. In fields such as autonomous driving, medical image analysis, and industrial detection, accurate stereo image quality can provide reliable input for algorithms and improve the accuracy of analysis and recognition. The field or issue of this study is image quality evaluation, which aims to find higher performance stereoscopic image quality evaluation methods. Therefore, this study draws inspiration from the idea of ensemble learning and designs two Convolutional Neural Network (CNN) stereoscopic image quality evaluation methods based on the semantic features of stereoscopic images and the local detail perception module, and fuses them to form a mixed evaluation model.This study aims to solve the problem of image quality assessment, which is to accurately identify the quality of stereoscopic images and provide high-value intelligent image services. With the continuous development of computer vision technology, the demand for accurate recognition of stereoscopic image quality is increasing, which is also the motivation of this study. This study drew inspiration from the idea of ensemble learning and designed two hybrid evaluation models based on stereo image semantic features and local detail perception modules. Convolutional Neural Network (CNN) was used to achieve stereo image quality evaluation. This is one of the main contributions of this article. In order to evaluate the performance of the designed model, the LIVE 3D Phase I dataset was used for testing experiments. The expected results show that when the number of test samples is 500, the overall measurement values of the Spearman rank ordered correlation coefficient (SROCC) and Pearson linear correlation coefficient (PLCC) of the designed ICNN1 and ICNN2 stereo image quality evaluation models are 0.940, 0.949, and 0.940, 0.949, respectively. These results are significantly higher than the selected contrastive deep learning models and machine learning models. In addition, the designed model has relatively low computational time but high computational memory consumption, which is one of the main gaps compared to other studies. In summary, the model designed in this study has great application potential in improving the accuracy of stereo image quality recognition, and is particularly suitable for the Chinese visual design industry. Future research can further explore the market-oriented application of this recognition model.