Objective. Human beings perceive stereoscopic image quality through the cerebral visual cortex, which is a complex brain activity. As a solution, the quality of stereoscopic images can be evaluated more accurately by attempting to replicate the human perception from electroencephalogram (EEG) signals on image quality in a machine, which is different from previous stereoscopic image quality assessment methods focused only on the extraction of image features. Approach. Our proposed method is based on a novel image-to-brain (I2B) cross-modality model including a spatial-temporal EEG encoder (STEE) and an I2B deep convolutional generative adversarial network (I2B-DCGAN). Specifically, the EEG representations are first learned by STEE as real samples of I2B-DCGAN, which is designed to extract both quality and semantic features from the stereoscopic images by a semantic-guided image encoder, and utilize a generator to conditionally create the corresponding EEG features for images. Finally, the generated EEG features are classified to predict the image perceptual quality level. Main results. Extensive experimental results on the collected brain-visual multimodal stereoscopic image quality ranking database, demonstrate that the proposed I2B cross-modality model can better emulate the visual perception mechanism of the human brain and outperform the other methods by achieving an average accuracy of 95.95
%
. Significance. The proposed method can convert the learned stereoscopic image features into brain representations without EEG signals during testing. Further experiments verify that the proposed method has good generalization ability on new datasets and the potential for practical applications.