“…Several techniques have been introduced to solve this problem with deep models (Muandet et al, 2013;Li et al, 2017Li et al, , 2018aMotiian et al, 2017), and with important results for a variety of datasets and data types, but the area is significantly under-explored with respect to video datasets, due to the complexity of entangling spatial and temporal domain shifts. In Yao et al (2019Yao et al ( , 2021, the only recent prominent work in this area, the authors present the Adversarial Pyramid Network (APN), a network capturing the videos' local-, global-, and multi-layer crossrelation features. They also extend an adversarial data augmentation method in Volpi et al (2018), ADA, to videos.…”