“…While massively multilingual models have obtained impressive quality improvements for low-resource languages as well as zero-shot scenarios (Aharoni et al, 2019;Arivazhagan et al, 2019a), it has not yet been shown how these massively multilingual models could be extended to unseen languages, beyond the pipelined approaches (Currey and Heafield, 2019;Lakew et al, 2019). On the other hand, self-supervised learning approaches have excelled at down-stream cross-lingual transfer (Devlin et al, 2019;Raffel et al, 2019;, but their success for unsupervised NMT (Conneau and Lample, 2019;Song et al, 2019) currently lacks robustness when languages are distant or monolingual data domains are mismatched (Neubig and Hu, 2018;Vulić et al, 2019). We observe that these two lines of research can be quite complementary and can compensate for each other's deficiencies.…”