Smile detection plays key role in recognizing human facial expressions, especially in inter-personal relations. In modern biometrics and security applications, face recognition is widely used. In real-life applications, automated face recognition has always been challenging because people express themselves differently. Real-world scenarios present many challenges, such as low-quality images, temporal variations, and disguises that alter facial characteristics. Face recognition has been accomplished using deep learning frameworks, but there are still several challenges to overcome, such as expression variations and lack of training data. In these methods, margins are used to enforce intra-class compactness and inter-class discrepancy to prevent overfitting. Multimodal facial expression detection is used for real-time face recognition to solve the aforementioned problems. First, we utilizes the Prewitt edge detector for eyes, nose, mouth, and brow detection, and design a deep transfer learning model to extract deep features from face images. Next, feature optimization is done by harpy eagle search optimization (HESO) to selects optimal best among multiple features which reduce data dimensionality problem. Then, we employ a semi-region based convolutional neural network (SR-CNN) to analyzes the facial expressions from multimodalities such as eyes (blinking), nose (shapes), mouth (smiling), and brow (shapes). The analyzed facial expressions and corresponding input images are fed into the benchmark SVM classifier for face recognition. The facial expression based face recognition framework is most suited for real-time scenario like video surveillance. Finally, we validate the performance our framework using benchmark multi-modal Cohn-Kanade and real-time 5671 person images datasets. The simulation results show that the proposed framework can effectively suppress data dimensionality issues and achieve high recognition rate compare to state-of-art frameworks.