Digital Twin is the seamless data integration between a physical and virtual machine in either direction. Emotion recognition in healthcare is becoming increasingly important due to recent developments in Machine Learning methods. However, it may face technical problems such as limited datasets, occlusion and lighting issues, identifying key features, incorrect emotion classification, high implementation costs, head posture, and a person's cultural background. This paper proposes a novel approach based on facial expression and body movement recognition for emotion recognition. It uses three devices (Kinect 1, Kinect 2, and RGB HD camera) to construct a new bi-modal database containing 17 participants' performances of six emotional states. Two mono-modal classifiers have been developed to obtain sufficient state information based on facial expression and body motion analysis. The system's performance is assessed using three algorithms: Bagged Trees, k-Nearest Neighbors (k-NN), and Support Vector Machine (Linear and Cubic). The acquired findings demonstrate the excellent performance of the suggested method and the effectiveness of the proposed features, particularly the combination of 3D distance and 3D angle, in characterising and identifying emotions. Results obtained using Kinect 2 marginally surpass those with Kinect 1. Comparing 2D RGB and RGB-D data reveals that the depth information significantly raises the recognition rate. RGB-D features can be used to represent emotions, but there are discrepancies between RGB and RG-D data.