Abstract-This paper presents the first method to compute dense scene flow in real-time for RGB-D cameras. It is based on a variational formulation where brightness constancy and geometric consistency are imposed. Accounting for the depth data provided by RGB-D cameras, regularization of the flow field is imposed on the 3D surface (or set of surfaces) of the observed scene instead of on the image plane, leading to more geometrically consistent results. The minimization problem is efficiently solved by a primal-dual algorithm which is implemented on a GPU, achieving a previously unseen temporal performance. Several tests have been conducted to compare our approach with a state-of-the-art work (RGB-D flow) where quantitative and qualitative results are evaluated. Moreover, an additional set of experiments have been carried out to show the applicability of our work to estimate motion in realtime. Results demonstrate the accuracy of our approach, which outperforms the RGB-D flow, and which is able to estimate heterogeneous and non-rigid motions at a high frame rate.
We propose a novel joint registration and segmentation approach to estimate scene flow from RGB-D images. Instead of assuming the scene to be composed of a number of independent rigidly-moving parts, we use non-binary labels to capture non-rigid deformations at transitions between the rigid parts of the scene. Thus, the velocity of any point can be computed as a linear combination (interpolation) of the estimated rigid motions, which provides better results than traditional sharp piecewise segmentations. Within a variational framework, the smooth segments of the scene and their corresponding rigid velocities are alternately refined until convergence. A K-means-based segmentation is employed as an initialization, and the number of regions is subsequently adapted during the optimization process to capture any arbitrary number of independently moving objects. We evaluate our approach with both synthetic and real RGB-D images that contain varied and large motions. The experiments show that our method estimates the scene flow more accurately than the most recent works in the field, and at the same time provides a meaningful segmentation of the scene based on 3D motion.
Creating textured 3D scans of indoor environments has experienced a large boost with the advent of cheap commodity depth sensors. However, the quality of the acquired 3D models is often impaired by color seams in the reconstruction due to varying illumination (e.g., shadows or highlights) and object surfaces whose brightness and color vary with the viewpoint of the camera. In this paper, we propose a direct and simple method to estimate the pure albedo of the texture, which allows us to remove illumination effects from IR and color images. Our approach first computes the illumination-independent albedo in the IR domain, which we subsequently transfer to the color albedo. As shadows and highlights lead to over-and underexposed image regions with little or no color information, we apply an advanced optimization scheme to infer color information in the color albedo from neighboring image regions. We demonstrate the applicability of our approach to various real-world scenes.
In this paper, we introduce the concept of proximity priors into semantic segmentation in order to discourage the presence of certain object classes (such as 'sheep' and 'wolf') 'in the vicinity' of each other. 'Vicinity' encompasses spatial distance as well as specific spatial directions simultaneously, e.g. 'plates' are found directly above 'tables', but do not fly over them. In this sense, our approach generalizes the co-occurrence prior by Ladicky et al.[3], which does not incorporate spatial information at all, and the non-metric label distance prior by Strekalovskiy et al. [11], which only takes directly neighboring pixels into account and often hallucinates ghost regions. We formulate a convex energy minimization problem with an exact relaxation, which can be globally optimized. Results on the MSRC benchmark show that the proposed approach reduces the number of mislabeled objects compared to previous co-occurrence approaches.
Abstract. We present an active learning framework for image segmentation with user interaction. Our system uses a sparse Gaussian Process classifier (GPC) trained on manually labeled image pixels (user scribbles) and refined in every active learning round. As a special feature, our method uses a very efficient online update rule to compute the class predictions in every round. The final segmentation of the image is computed via convex optimization. Results on a standard benchmark data set show that our algorithm is better than a recent state-of-the-art method. We also show that the queries made by the algorithm are more informative compared to randomly increasing the training data, and that our online version is much faster than the standard offline GPC inference.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.