The rapid development of wearable sensors promotes convenient data
collection in human daily life. Human Activity Recognition (HAR), as a
prominent research direction for wearable applications, has made
remarkable progress in recent years. However, existing efforts mostly
focus on improving recognition accuracy, paying limited attention to the
model’s functional scalability, specifically its ability for continual
learning. This limitation greatly restricts its application in
open-world scenarios. Moreover, due to storage and privacy concerns, it
is often impractical to retain the activity data of different users for
subsequent tasks, especially egocentric visual information. Furthermore,
the imbalance between visual-based and inertial-measurement-unit (IMU)
sensing modality introduces challenges of lack of generalization when
employing conventional continual learning techniques. In this paper, we
propose a motivational learning scheme to address the limited
generalization caused by the modal imbalance, enabling foreseeable
generalization in a visual-IMU multimodal network. To overcome
forgetting, we introduce a robust representation estimation technique
and a pseudo-representation generation strategy for continual learning.
Experimental results on the egocentric multimodal activity dataset
UESTC-MMEA-CL demonstrate the effectiveness of our proposed method.
Furthermore, our method effectively leverages the generalization
capabilities of IMU-based modal representations, outperforming general
and state-of-the-art continual learning methods in various task
settings.