“…For the first time, Mildenhall et al [39] propose the neural radiance field to implicitly represent static 3D scenes and synthesize novel views from multiple posed images. Inspired by their successes, a lot of NeRF-based models [2], [10], [12], [14], [20], [21], [22], [24], [26], [34], [36], [37], [40], [42], [44], [46], [49], [53], [55], [64], [67], [75], [78] have been proposed. For example, point-NeRF [65] and DS-NeRF [15] incorporate sparse 3D point cloud and depth information for eliminating the geometry ambiguity of NeRFs, achieving more accurate and efficient 3D point sampling as well as better rendering quality.…”