“…Here, we focus on video domain adaptation for activity recognition. State-of-the-art visual-only solutions learn to reduce the shift in activity appearance by adversarial training [5,6,8,9,20,27,29] and self-supervised learning techniques [9,22,27,34]. While Jamal et al [20] and Munro and Damen [27] directly penalize domain specific features with an adversarial loss at every time stamp, Chen et al [5], Choi et al [9] and Pan et al [29] attend to temporal segments that contain important cues.…”