This work proposes an innovative method for evaluating users’ engagement, combining the User Engagement Scale (UES) questionnaire and a facial expression recognition (FER) system, active research topics of increasing interest in the human–computer interaction domain (HCI). The subject of the study is a 3D simulator that reproduces a virtual FabLab in which users can approach and learn 3D modeling software and 3D printing. During the interaction with the virtual environment, a structured-light camera acquires the face of the participant in real-time, to catch its spontaneous reactions and compare them with the answers to the UES closed-ended questions. FER methods allow overcoming some intrinsic limits in the adoption of questioning methods, such as the non-sincerity of the interviewees and the lack of correspondence with facial expressions and body language. A convolutional neural network (CNN) has been trained on the Bosphorus database (DB) to perform expression recognition and the classification of the video frames in three classes of engagement (deactivation, average activation, and activation) according to the model of emotion developed by Russell. The results show that the two methodologies can be integrated to evaluate user engagement, to combine weighted answers and spontaneous reactions and to increase knowledge for the design of the new product or service.