3D symmetry detection is a fundamental problem in computer vision and graphics. Most prior works detect symmetry when the object model is fully known, few studies symmetry detection on objects with partial observation, such as single RGB-D images. Recent work addresses the problem of detecting symmetries from incomplete data with a deep neural network by leveraging the dense and accurate symmetry annotations. However, due to the tedious labeling process, full symmetry annotations are not always practically available. In this work, we present a 3D symmetry detection approach to detect symmetry from single-view RGB-D images without using symmetry supervision. The key idea is to train the network in a weakly-supervised learning manner to complete the shape based on the predicted symmetry such that the completed shape be similar to existing plausible shapes. To achieve this, we first propose a discriminative variational autoencoder to learn the shape prior in order to determine whether a 3D shape is plausible or not. Based on the learned shape prior, a symmetry detection network is present to predict symmetries that produce shapes with high shape plausibility when completed based on those symmetries. Moreover, to facilitate end-to-end network training and multiple symmetry detection, we introduce a new symmetry parametrization for the learning-based symmetry estimation of both reflectional and rotational symmetry. The proposed approach, coupled symmetry detection with shape completion, essentially learns the symmetry-aware shape prior, facilitating more accurate and robust symmetry detection. Experiments demonstrate that the proposed method is capable of detecting reflectional and rotational symmetries accurately, and shows good generality in challenging scenarios, such as objects with heavy occlusion and scanning noise. Moreover, it achieves state-of-the-art performance, improving the F1-score over the existing supervised learning method by 2%-11% on the ShapeNet and ScanNet datasets.
Learning-based multi-view stereo (MVS) has by far centered around 3D convolution on cost volumes. Due to the high computation and memory consumption of 3D CNN, the resolution of output depth is often considerably limited. Different from most existing works dedicated to adaptive refinement of cost volumes, we opt to directly optimize the depth value along each camera ray, mimicking the range (depth) finding of a laser scanner. This reduces the MVS problem to ray-based depth optimization which is much more light-weight than full cost volume optimization. In particular, we propose RayMVSNet which learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth. This sequential modeling, conducted based on transformer features, essentially learns the epipolar line search in traditional multi-view stereo. We devise a multi-task learning for better optimization convergence and depth accuracy. We found the monotonicity property of the SDFs along each ray greatly benefits the depth estimation. Our method ranks top on both the DTU and the Tanks & Temples datasets over all previous learning-based methods, achieving an overall reconstruction score of 0.33mm on DTU and an F-score of 59.48% on Tanks & Temples. It is able to produce high-quality depth estimation and point cloud reconstruction in challenging scenarios such as objects/scenes with non-textured surface, severe occlusion, and highly varying depth range. Further, we propose RayMVSNet++ to enhance contextual feature aggregation for each ray through designing an attentional gating unit to select semantically relevant neighboring rays within the local frustum around that ray. This improves the performance on datasets with more challenging examples (e.g. low-quality images caused by poor lighting conditions or motion blur). RayMVSNet++ achieves state-of-the-art performance on the ScanNet dataset. In particular, it attains an AbsRel of 0.058m and produces accurate results on the two subsets of textureless regions and large depth variation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.