Emotion charting using multimodal signals has gained great demand for stroke-affected patients, for psychiatrists while examining patients, and for neuromarketing applications. Multimodal signals for emotion charting include electrocardiogram (ECG) signals, electroencephalogram (EEG) signals, and galvanic skin response (GSR) signals. EEG, ECG, and GSR are also known as physiological signals, which can be used for identification of human emotions. Due to the unbiased nature of physiological signals, this field has become a great motivation in recent research as physiological signals are generated autonomously from human central nervous system. Researchers have developed multiple methods for the classification of these signals for emotion detection. However, due to the non-linear nature of these signals and the inclusion of noise, while recording, accurate classification of physiological signals is a challenge for emotion charting. Valence and arousal are two important states for emotion detection; therefore, this paper presents a novel ensemble learning method based on deep learning for the classification of four different emotional states including high valence and high arousal (HVHA), low valence and low arousal (LVLA), high valence and low arousal (HVLA) and low valence high arousal (LVHA). In the proposed method, multimodal signals (EEG, ECG, and GSR) are preprocessed using bandpass filtering and independent components analysis (ICA) for noise removal in EEG signals followed by discrete wavelet transform for time domain to frequency domain conversion. Discrete wavelet transform results in spectrograms of the physiological signal and then features are extracted using stacked autoencoders from those spectrograms. A feature vector is obtained from the bottleneck layer of the autoencoder and is fed to three classifiers SVM (support vector machine), RF (random forest), and LSTM (long short-term memory) followed by majority voting as ensemble classification. The proposed system is trained and tested on the AMIGOS dataset with k-fold cross-validation. The proposed system obtained the highest accuracy of 94.5% and shows improved results of the proposed method compared with other state-of-the-art methods.