Figure 1: The proposed method estimates a persistent, temporally-aware scene model M i from a series of scene observations S i , captured at sparse time intervals. M i−1 is used to estimate an arrangement of objects in each novel observation S i . The estimated arrangement is used to estimate the instance segmentation of S i , which is then used to update the model M i .
AbstractIn depth-sensing applications ranging from home robotics to AR/VR, it will be common to acquire 3D scans of interior spaces repeatedly at sparse time intervals (e.g., as part of regular daily use). We propose an algorithm that analyzes these "rescans" to infer a temporal model of a scene with semantic instance information. Our algorithm operates inductively by using the temporal model resulting from past observations to infer an instance segmentation of a new scan, which is then used to update the temporal model. The model contains object instance associations across time and thus can be used to track individual objects, even though there are only sparse observations. During experiments with a new benchmark for the new task, our algorithm outperforms alternate approaches based on stateof-the-art networks for semantic instance segmentation.