As 5G communication technology allows for speedier access to extended information and knowledge, a more sophisticated human−machine interface beyond touchscreens and keyboards is necessary to improve the communication bandwidth and overcome the interfacing barrier. However, the full extent of human interaction beyond operation dexterity, spatial awareness, sensory feedback, and collaborative capability to be replicated completely remains a challenge. Here, we demonstrate a hybridflexible wearable system, consisting of simple bimodal capacitive sensors and a customized low power interface circuit integrated with machine learning algorithms, to accurately recognize complex gestures. The 16 channel sensor array extracts spatial and temporal information of the finger movement (deformation) and hand location (proximity) simultaneously. Using machine learning, over 99 and 91% accuracy are achieved for user-independent static and dynamic gesture recognition, respectively. Our approach proves that an extremely simple bimodal sensing platform that identifies local interactions and perceives spatial context concurrently, is crucial in the field of sign communication, remote robotics, and smart manufacturing.