“…The second group of methods includes faster two-stage approaches [44,49,47,106,105,8,81,28,23] that start by generating multiple views of the object using a text-to-image or -video model [52,13] tuned to output multiple views of the object followed by per-scene optimization using NeRF [56] or 3DGS [37]. However, per-scene optimization requires several highly-consistent views which are difficult to generate reliably.…”