To autonomously move and operate objects in cluttered indoor environments, a service robot requires the ability of 3D scene perception. Though 3D object detection can provide an object-level environmental description to fill this gap, a robot always encounters incomplete object observation, recurring detections of the same object, error in detection, or intersection between objects when conducting detection continuously in a cluttered room. To solve these problems, we propose a two-stage 3D object detection algorithm which is to fuse multiple views of 3D object point clouds in the first stage and to eliminate unreasonable and intersection detections in the second stage. For each view, the robot performs a 2D object semantic segmentation and obtains 3D object point clouds. Then, an unsupervised segmentation method called Locally Convex Connected Patches (LCCP) is utilized to segment the object accurately from the background. Subsequently, the Manhattan Frame estimation is implemented to calculate the main orientation of the object and subsequently, the 3D object bounding box can be obtained. To deal with the detected objects in multiple views, we construct an object database and propose an object fusion criterion to maintain it automatically. Thus, the same object observed in multi-view is fused together and a more accurate bounding box can be calculated. Finally, we propose an object filtering approach based on prior knowledge to remove incorrect and intersecting objects in the object dataset. Experiments are carried out on both SceneNN dataset and a real indoor environment to verify the stability and accuracy of 3D semantic segmentation and bounding box detection of the object with multi-view fusion.
Convolutional neural network (CNN)-based methods, which train an end-to-end model to regress a six degree of freedom (DoF) pose of a robot from a single red-green-blue (RGB) image, have been developed to overcome the poor robustness of robot visual relocalization recently. However, the pose precision becomes low when the test image is dissimilar to training images. In this paper, we propose a novel method, named image-similarity-based CNN, which considers the image similarity of an input image during the CNN training. The higher the similarity of the input image, the higher precision we can achieve. Therefore, we crop the input image into several small image blocks, and the similarity between each cropped image block and training dataset images is measured by employing a feature vector in a fully connected CNN layer. Finally, the most similar image is selected to regress the pose. A genetic algorithm is utilized to determine the cropped position. Experiments on both open-source dataset 7-Scenes and two actual indoor environments are conducted. The results show that the proposed algorithm leads to better results and reduces large regression errors effectively compared with existing solutions.
Environmental perception is a vital feature for service robots when working in an indoor environment for a long time. The general 3D reconstruction is a low-level geometric information description that cannot convey semantics. In contrast, higher level perception similar to humans requires more abstract concepts, such as objects and scenes. Moreover, the 2D object detection based on images always fails to provide the actual position and size of an object, which is quite important for a robot’s operation. In this paper, we focus on the 3D object detection to regress the object’s category, 3D size, and spatial position through a convolutional neural network (CNN). We propose a multi-channel CNN for 3D object detection, which fuses three input channels including RGB, depth, and bird’s eye view (BEV) images. We also propose a method to generate 3D proposals based on 2D ones in the RGB image and semantic prior. Training and test are conducted on the modified NYU V2 dataset and SUN RGB-D dataset in order to verify the effectiveness of the algorithm. We also carry out the actual experiments in a service robot to utilize the proposed 3D object detection method to enhance the environmental perception of the robot.
In the real world, the index of refraction of a refractive object (caustic object) varies across the wavelengths. Therefore, in physically based caustic rendering, we need to take into account spectral information. However, this may lead to prohibitive running time. In response, we propose a two-step acceleration scheme for spectral caustic rendering. Our acceleration scheme takes into account information across visible wavelengths of the scene, that is, the index of refraction (IOR) (caustic object), light power (light), and material reflectance (surface). To process visible wavelengths effectively, firstly we cluster the wavelengths which have similar first refraction (air to caustic object) directions. In this way, all the wavelengths in a cluster can be represented by one light ray during rendering. Secondly, by considering the surrounding objects (their material reflectance from and visible surface area of the caustic objects) and light power, we compute the refinement amount of each wavelength cluster. Our accelerated algorithm can produce photorealistic rendering results close to their reference images (which are generated by rendering every 1 nm of visible wavelengths) with a significant acceleration magnitude. Computational experiment results and comparative analyses are reported in the paper
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.