Abstract:Scene understanding is an active research area. Commercial depth sensors, such as Kinect, have enabled the release of several RGB-D datasets over the past few years which spawned novel methods in 3D scene understanding. More recently with the launch of the LiDAR sensor in Apple's iPads and iPhones, high quality RGB-D data is accessible to millions of people on a device they commonly use. This opens a whole new era in scene understanding for the Computer Vision community as well as app developers. The fundament… Show more
“…Kinect V1/V2, 11 OAK-D-Lite, 12 and even iPhone's back facing camera satisfy the database requirements. 13 A depth image produced using iPhone camera is shown below in Figure 2. After experimentation with several cameras (iPhone X, OAK-D-Lite, Intel RealSense L515, Kinect V1, Kinect V2), it was decided to use Kinect V2 as it produced the most detailed depth map out of all tested cameras.…”
A large number of robotic and human-assisted missions to the Moon and Mars are forecast. NASA's efforts to learn about the geology and makeup of these celestial bodies rely heavily on the use of robotic arms. The safety and redundancy aspects will be crucial when humans will be working alongside the robotic explorers. Additionally, robotic arms are crucial to satellite servicing and planned orbit debris mitigation missions. The goal of this work is to create a custom Computer Vision (CV) based Artificial Neural Network (ANN) that would be able to rapidly identify the posture of a 7 Degree of Freedom (DoF) robotic arm from a single (RGB-D) image -just like humans can easily identify if an arm is pointing in some general direction. The Sawyer robotic arm is used for developing and training this intelligent algorithm. Since Sawyer's joint space spans 7 dimensions, it is an insurmountable task to cover the entire joint configuration space. In this work, orthogonal arrays are used, similar to the Taguchi method, to efficiently span the joint space with the minimal number of training images. This "optimally" generated database is used to train the custom ANN and its degree of accuracy is on average equal to twice the smallest joint displacement step used for database generation. A pre-trained ANN will be useful for estimating the postures of robotic manipulators used on space stations, spacecraft, and rovers as an auxiliary tool or for contingency plans.
“…Kinect V1/V2, 11 OAK-D-Lite, 12 and even iPhone's back facing camera satisfy the database requirements. 13 A depth image produced using iPhone camera is shown below in Figure 2. After experimentation with several cameras (iPhone X, OAK-D-Lite, Intel RealSense L515, Kinect V1, Kinect V2), it was decided to use Kinect V2 as it produced the most detailed depth map out of all tested cameras.…”
A large number of robotic and human-assisted missions to the Moon and Mars are forecast. NASA's efforts to learn about the geology and makeup of these celestial bodies rely heavily on the use of robotic arms. The safety and redundancy aspects will be crucial when humans will be working alongside the robotic explorers. Additionally, robotic arms are crucial to satellite servicing and planned orbit debris mitigation missions. The goal of this work is to create a custom Computer Vision (CV) based Artificial Neural Network (ANN) that would be able to rapidly identify the posture of a 7 Degree of Freedom (DoF) robotic arm from a single (RGB-D) image -just like humans can easily identify if an arm is pointing in some general direction. The Sawyer robotic arm is used for developing and training this intelligent algorithm. Since Sawyer's joint space spans 7 dimensions, it is an insurmountable task to cover the entire joint configuration space. In this work, orthogonal arrays are used, similar to the Taguchi method, to efficiently span the joint space with the minimal number of training images. This "optimally" generated database is used to train the custom ANN and its degree of accuracy is on average equal to twice the smallest joint displacement step used for database generation. A pre-trained ANN will be useful for estimating the postures of robotic manipulators used on space stations, spacecraft, and rovers as an auxiliary tool or for contingency plans.
“…Its elements correspond to the column-index in the matrix of corners. [2,8], [3,8], [1,3], [4,7], [7,5], [6,5], [4,6], [1,4], [2,7], [8,5], [3,6]] (4) [1,2,4], [1,3,4], [5,6,7], [5,6,8], [5,7,8]]…”
Section: Definition Of a Bounding Boxmentioning
confidence: 99%
“…As 3D object detection gets more popular and new datasets are published [1,2,3], evaluation metrics gain in importance. The most common one is Intersection over Union (IoU).…”
The most popular evaluation metric for object detection in 2D images is Intersection over Union (IoU). Existing implementations of the IoU metric for 3D object detection usually neglect one or more degrees of freedom. In this paper, we first derive the analytic solution for three dimensional bounding boxes. As a second contribution, a closed-form solution of the volume-to-volume distance is derived. Finally, the Bounding Box Disparity is proposed as a combined positive continuous metric. We provide open source implementations of the three metrics as standalone python functions, as well as extensions to the Open3D library and as ROS nodes.
“…On-device depth estimation is critical in navigation [40], gaming [6], and augmented/virtual reality [3,8]. Previously, various solutions based on stereo/structured-light sensors and indirect time-of-flight sensors (iToF) [4, 34,55] have been proposed.…”
Section: Introductionmentioning
confidence: 99%
“…Each dToF pixel captures and pre-processes depth information from a local patch in the scene (Sec. 3), leading to high spatial ambiguity when estimating the high-resolution depth maps for downstream tasks [8]. Previous RGB-guided depth completion and super-resolution algorithms either assume high resolution spatial information (e.g.…”
Figure 1. We propose the first multi-frame approaches, dToF depth video super-resolution (DVSR) and histogram video super-resolution (HVSR), to super-resolve low-resolution dToF sensor videos with the high-resolution RGB frame guidance. The point cloud visualizations of depth predictions reveal that, by utilizing multi-frame correlation, DVSR predicts significantly better geometry compared to state-of-theart per-frame depth enhancement networks [41] while being more lightweight; HVSR further improves the fidelity of geometry and reduces flying pixels by utilizing the dToF histogram information. Besides the improvements in per-frame estimation, we highly recommend readers to check out the supplementary video, which visualizes the significant improvements in temporal stability across the entire sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.