Measuring customer satisfaction based on facial expressions from video surveillance can potentially support real-time analysis. We propose the use of deep residual network (ResNet), which has been a widely used for many image recognition tasks, but not in the context of recognizing facial expressions in video surveillance. A key challenge in collecting video surveillance data in an airport context is to achieve a balanced distribution of all emotions, as most of passengers' faces are either neutral or happy. To solve this issue, there is no existing work that has established the feasibility of using datasets from different domains to train the model. This paper is the first in investigating the benefits of using residual training approach and adopt a pre-trained network from similar tasks to reduce training time. Based on comprehensive experiments, which compare domain-specific, crossdomain and mixed domain training and testing approaches, we confirm the value of augmenting datasets from different domains (CK+, JAFFE, AffectNet) for the surveillance domain.