“…Traditional fully-supervised deep learning methods typically require large amounts of annotated data, introducing a significant proneto-ambiguity annotation workload [36,37,47,55]. For this reason, learning with scarce data (i.e., few-shot learning) has received increasing attention, in domains like object detection [8,14,28,[42][43][44], action recognition [1,2,4,13,54,59], and action localization [9,15,51,52].…”