Stress has a significant negative impact on people, which has made it a primary social concern. Early stress detection is essential for effective stress management. This study proposes a Deep Learning (DL) method for effective stress detection using multimodal physiological signals -Electrocardiogram (ECG) and Electrodermal activity (EDA) . The extensive latent feature representation of DL models has yet to be fully explored. Hence, this paper proposes a hierarchical autoencoder feature fusion on the frequency domain. The latent representations from different layers of AutoEncoders(AE) are combined and given as input to the classifier -Convolutional Recurrent Neural Network with Squeeze and Excitation (CRNN-SE) model. A two-set performance comparison is performed (i) performance on frequency band features, and raw data are compared. (ii) autoencoders trained on three cost functions -Mean Squared Error (MSE), Kullback-Leibler (KL) divergence, and Cosine similarity performance are compared on frequency band features and raw data. To verify the generalizability of our approach, we tested it on four benchmark datasets-WAUC, CLAS, MAUS and ASCERTAIN. Results show that frequency band features showed better results than raw data by 4-8%, respectively. MSE loss produced better results than other losses for both frequency band features and raw data by 3-7%, respectively. The proposed approach considerably outperforms existing stress detection models that are subject-independent by 1-2%, respectively.