“…Researchers have used other forms of supervision (strong supervision, weak supervision, imitation learning, reinforcement learning, inverse reinforcement learning) to build interactive understanding of objects. This can be in the form of learning a) where and how to grasp [9,21,26,27,30,35,36,39,43,51], b) state classifiers [25], c) interaction hotspots [15,42,44,61], d) spatial priors for action sites [46], e) object articulation modes [12,38], f) reward functions [29,31,50,52], g) functional correspondences [34]. While our work pursues similar goals, we differ in our supervision source (observation of human hands interacting with objects in egocentric videos).…”