Widely accepted explicit authentication protocols are vulnerable to a series of attacks, for example, shoulder surfing and smudge attacks, leaving users with the constant burden of periodic password changes. As such, we propose a novel framework for continuous authentication on smartphones. This approach is guided by pattern unlocking, which is widely used and will not cause learning cost. After collecting multi‐modal data that describe both behavioral and contextual information, we employ a multi‐branch context‐aware attention network as the representation learner to perform feature extraction, then an auto encoder is then used for authentication. To overcome challenges, including cold‐start and few‐shot training, which is less discussed in other works, we incorporate transfer learning with a coarse‐to‐fine pre‐training workflow. Additionally, we deploy a hierarchical approach to offload model tuning overhead from smartphones. Extensive experiments on more than 68 000 real‐world recordings validate the effectiveness of the proposed method, with an EER (equal error rate) of 2.472% under mixed contexts, which consistently outperforms state‐of‐the‐art approaches under both static and mixed contexts.