“…To the best of our knowledge, the existing vision-based HGR systems recognize gestures based on the "one-shot" scheme, namely, classifying a gesture based on the appearance of one image. Therefore, to achieve high recognition accuracy, features in an image need to be fully extracted, leading to high computing complexity and power overhead [16]- [18]. On the other hand, energy-efficient HGR systems [14], [19], [20] based on simplified features usually undergo stability degradation, particularly when deployed under backgrounds with more interferences.…”