“…They explore diverse training strategies [84,83], mine extra supervisory signals from synthesized samples [27,71,28] or auxiliary tasks [83,35,53,93,78], and explore intelligent path planning [39,54,81]. For structured and long-range context modeling, recent solutions were developed with environment map [92,13,21,80], transformer architectures [33,61,48,64,11], and multimodal pretraining [56,31,30,12].…”