“…Addressing the distribution shift is a crucial research problem since deep learning models are fragile to testing distribution different from the training [51]. In this aspect, various benchmarks have been proposed to measure the robustness under distribution shifts [9,14,23,25,26,29,30,45,48,50], and this problem has been extensively studied in broad research fields [3,4,10,15,16,24,38,39,40,43,52,55,62]. Among them, benchmarking robustness [23] and resolving scene bias [10,42] or distribution shift [43,59] are the most related to our problem setup.…”