“…It is well-known that training deep models on synthetic images for performing on real-world ones requires domain adaptation [ 8 , 9 ], which must be unsupervised if we have no labels from real-world images [ 10 ]. Thus, this paper falls into the realm of unsupervised domain adaptation (UDA) for semantic segmentation [ 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 ], i.e., in contrast to assuming access to labels from the target domain [ 23 , 24 ]. Note that the great relevance of UDA in this context comes from the fact that, until now, pixel-level semantic image segmentation labels are obtained by cumbersome and error-prone manual work.…”