“…On top of such semantic exploration, the perception skills in terms of where to look and the navigation can be further disentangled [23] for an improved success rate. Moreover, spatial relations among objects have also been formulated as graphs and embedded via Graph Convolutional Networks to guide the navigation policy [24], where external commonsense knowledge has also shown advantages for the object localisation via spatial graph learning [3].…”