As one of the most common ways for user authentication, Personal Identification Number (PIN), due to its simplicity and convenience, has suffered from plenty of side-channel attacks, which pose a severe threat to people’s privacy and property. The success of existing attacks is usually built upon the premise of no occlusion between the attacker and the victim’s hand gesture, but it increases the difficulty of launching the attack and the possibility of exposure. To overcome such limitation, we propose ArmSpy++, an improved video-assisted PIN inference attack built upon our previous research, ArmSpy. Specifically, ArmSpy++ employs new modules to leverage more features like the keystroke-induced elbow bending, wrist speed variation, and the spatial relationship between different arm joints, to correctly detect Keystrokes. ArmSpy++ delves into the perspective relationship and natural typing habits to ensure a high success rate of PIN inference. We also re-designed the inferred PIN pattern coordination mechanism to accurately deduce the PINs. By using a pre-trained HigherHRNet model for posture estimation ArmSpy++ eliminates the necessity of additional training. The extensive experiments demonstrate that ArmSpy++ can achieve over
\(83.1\% \)
average accuracy with 3 attempts and even
\(92.5\% \)
for some victims, indicating the severity of the threat posed by ArmSpy++.