To effectively organize design elements in virtual reality (VR) scene design and provide evaluation methods for the design process, we built a user image space cognitive model. This involved perceptual engineering methods and optimization of the VR interface. First, we studied the coupling of user cognition and design features in the VR system via the Kansei Engineering (KE) method. The quantitative theory I and KE model regression analysis were used to analyze the design elements of the VR system’s human–computer interaction interface. Combined with the complex network method, we summarized the relationship between design features and analyzed the important design features that affect users’ perceptual imagery. Then, based on the characteristics of machine learning, we used a convolutional neural network (CNN) to predict and analyze the user’s perceptual imagery in the VR system, to provide assistance for the design optimization of the VR system design. Finally, we verified the validity and feasibility of the solution by combining it with the human–machine interface design of the VR system. We conducted a feasibility analysis of the KE model, in which the similarity between the multivariate regression analysis of the VR intention space and the experimental test was approximately 97% and the error was very small; thus, the VR intention space model was well correlated. The Mean Square Error (MSE) of the convolutional neural network (CNN) prediction model was calculated with a measured value of 0.0074, and the MSE value was less than 0.01. The results show that this method can improve the effectiveness and feasibility of the design scheme. Designers use important design feature elements to assist in VR system optimization design and use CNN machine learning methods to predict user image values in VR systems and improve the design efficiency. Facing the same design task requirements in VR system interfaces, the traditional design scheme was compared with the scheme optimized by this method. The results showed that the design scheme optimized by this method better fits the user’s perceptual imagery index, and thus the user’s task operation experience was better.