“…Recently, neural implicit representations demonstrated promising results for object geometry representation [7, 18, 20, 28, 30-32, 36, 50, 54, 57, 58], scene completion [5,14,33], novel view synthesis [19,21,34,60] and also generative modelling [6,26,27,39]. A few recent papers [1,3,8,23,44] attempt to predict scene-level geometry with RGB-(D) inputs, but they all assume given camera poses. Another set of works [17,51,59] tackle the problem of camera pose optimization, but they need a rather long optimization process, which is not suitable for real-time applications.…”