This work concludes the first study on mouth-based emotion recognition while adopting a transfer learning approach. Transfer learning results are paramount for mouth-based emotion emotion recognition, because few datasets are available, and most of them include emotional expressions simulated by actors, instead of adopting real-world categorisation. Using transfer learning, we can use fewer training data than training a whole network from scratch, and thus more efficiently fine-tune the network with emotional data and improve the convolutional neural network’s performance accuracy in the desired domain. The proposed approach aims at improving emotion recognition dynamically, taking into account not only new scenarios but also modified situations to the initial training phase, because the image of the mouth can be available even when the whole face is visible only in an unfavourable perspective. Typical applications include automated supervision of bedridden critical patients in a healthcare management environment, and portable applications supporting disabled users having difficulties in seeing or recognising facial emotions. This achievement takes advantage of previous preliminary works on mouth-based emotion recognition using deep-learning, and has the further benefit of having been tested and compared to a set of other networks using an extensive dataset for face-based emotion recognition, well known in the literature. The accuracy of mouth-based emotion recognition was also compared to the corresponding full-face emotion recognition; we found that the loss in accuracy is mostly compensated by consistent performance in the visual emotion recognition domain. We can, therefore, state that our method proves the importance of mouth detection in the complex process of emotion recognition.