To apply emotion recognition and classification technology to the field of human-robot interaction, it is necessary to implement fast data processing and model weight reduction. This paper proposes a new photoplethysmogram (PPG) and galvanic skin response (GSR) signals-based labeling method using Asian multimodal data, a real-time emotion classification method, a 1d convolutional neural network autoencoder model, and a lightweight model obtained using knowledge distillation. In addition, the model performance was verified using the public DEAP dataset and the Asian multi-modal dataset 'MERTI-Apps'. For emotion classification, bio-signal data were window-sliced in 1-pulse units, and the label was reset to reflect the characteristics of the PPG and GSR signals. Simple data pre-processing, such as the prevention of loss and waveform duplication, was performed without using handcrafted features. The experiment showed that the accuracy of the proposed model using MERTI-Apps was 79.18% and 74.84% in the case of arousal and valence, respectively, for 3-class criteria, and the accuracy of the proposed model using DEAP was 81.33% and 80.25% in the case of arousal and valence, respectively, for 2-class criteria. The accuracy of the lightweight model was 77.87% and 73.49% in the case of arousal and valence, respectively, for 3-class criteria and its calculation time was reduced by more than 80% compared to the proposed 1d convolutional autoencoder model. We also confirmed that the proposed model improved computational time and accuracy compared to previous studies using MERTI-Apps and the lightweight model used in limited hardware environments enabled fast computation and real-time emotion classification.