Automatic detection of backchannel has great potential to enhance artificial mediators, which indicate listeners' attention and agreement in human communication. It is often expressed by subtle nonverbal cues that occur briefly and sparsely. Focusing on identifying and locating these subtle cues (i.e., their occurrence moment and the involved body parts), this paper proposes a novel approach for backchannel detection. In particular, our model utilizes temporaland modality-attention modules to determine and lead the model to pay more attention to both the indicative moment and the accompanying body parts at that specific time. It achieves an accuracy of 68.6% on the testing set in MultiMediate'23 backchannel detection challenge, outperforming the counterparts. Furthermore, we conducted an ablation study to thoroughly understand the contributions of our model. This study underscores the effectiveness of our selection of modality inputs and the importance of the two attention modules in our model.
CCS CONCEPTS• Computing methodologies → Artificial intelligence.