“…First, the data collection process should minimize the effort for new users to get started with. However, previous approaches to SSIs, not limited to lipreading-based approaches, adopt a train-from-scratch model that requires collecting hundreds of samples from real users [34,57,58,69], leading to excessive mental and physical user burden. Second, such data collected intensively in controlled laboratory environments causes a biased model, which can be sensitive to even minor changes in factors such as lighting, face orientations, and postures, yet there is little discussion on the model's ability to generalize to unseen environments.…”