Multimodal emotion recognition has gained traction in affective computing research community to overcome the limitations posed by the processing a single form of data and to increase recognition robustness. In this study, a novel emotion recognition system is introduced, which is based on multiple modalities including facial expressions, galvanic skin response (GSR) and electroencephalogram (EEG). This method follows a hybrid fusion strategy and yields a maximum one-subject-out accuracy of 81.2% and a mean accuracy of 74.2% on our bespoke multimodal emotion dataset (LUMED-2) for 3 emotion classes: sad, neutral and happy. Similarly, our approach yields a maximum one-subject-out accuracy of 91.5% and a mean accuracy of 53.8% on the Database for Emotion Analysis using Physiological Signals (DEAP) for varying numbers of emotion classes, 4 in average, including angry, disgust, afraid, happy, neutral, sad and surprised. The presented model is particularly useful in determining the correct emotional state in the case of natural deceptive facial expressions. In terms of emotion recognition accuracy, this study is superior to, or on par with, the reference subject-independent multimodal emotion recognition studies introduced in the literature.