“…Leveraging scene context is useful for object detection [40,7,11,13], semantic segmentation [35,39,40,69], predicting invisible things [32], and action recognition without looking at the human [23,54]. Some work have shown that explicitly factoring human action out of context leads to improved performance in action recognition [71,61]. In contrast to prior work that uses scene contexts to facilitate recognition, our method aims to learn representations that are invariant to scene bias.…”