“…Theorem 2.1 is the foundation of many recent works on unsupervised domain adaptation via learning invariant representations [Ajakan et al, 2014, Ganin et al, 2016, Zhao et al, 2018b, Pei et al, 2018, Zhao et al, 2018a. It has also inspired various applications of domain adaptation with adversarial learning, e.g., video analysis [Hoffman et al, 2016, Shrivastava et al, 2016, natural language understanding [Zhang et al, 2017, Fu et al, 2017, speech recognition [Zhao et al, 2019a, Hosseini-Asl et al, 2018, to name a few. At a high level, the key idea is to learn a rich and parametrized feature transformation g : X → Z such that the induced source and target distributions (on Z) are close, as measured by the Hdivergence.…”