Expandable YOLO: 3D Object Detection from RGB-D Images

Takahashi, Masahiro; Ji, Yonghoon; Umeda, Kazunori; Moro, Alessandro

doi:10.1109/rem49740.2020.9313886

Cited by 20 publications

(12 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most deep learning stair detection methods [ 4 , 7 , 8 ] focus on extracting stair features in monocular vision through a CNN, and there is no deep learning method to make full use of the complementary relationship between the RGB map and the depth map for stair detection. Regarding the RGB-D fusion methods for deep learning, some methods fuse features in the input and output locations by simple summation and concatenation [ 14 , 15 , 16 , 17 , 18 ], and some methods design special modules to explore the implicit relationship between the two modalities [ 19 , 20 , 21 , 22 ]. This section briefly introduces some RGB-D-based stair detection methods and some RGB-D fusion methods for deep learning.…”

Section: Related Workmentioning

confidence: 99%

RGB-D-Based Stair Detection and Estimation Using Deep Learning

Wang

Pei

Qiu

et al. 2023

Sensors

View full text Add to dashboard Cite

Stairs are common vertical traffic structures in buildings, and stair detection tasks are important in environmental perception for autonomous mobile robots. Most existing algorithms have difficulty combining the visual information from binocular sensors effectively and ensuring reliable detection at night and in the case of extremely fuzzy visual clues. To solve these problems, we propose a stair detection network with red-green-blue (RGB) and depth inputs. Specifically, we design a selective module, which can make the network learn the complementary relationship between the RGB feature maps and the depth feature maps and fuse the features effectively in different scenes. In addition, we propose several postprocessing algorithms, including a stair line clustering algorithm and a coordinate transformation algorithm, to obtain the stair geometric parameters. Experiments show that our method has better performance than existing the state-of-the-art deep learning method, and the accuracy, recall, and runtime are improved by 5.64%, 7.97%, and 3.81 ms, respectively. The improved indexes show the effectiveness of the multimodal inputs and the selective module. The estimation values of stair geometric parameters have root mean square errors within 15 mm when ascending stairs and 25 mm when descending stairs. Our method also has extremely fast detection speed, which can meet the requirements of most real-time applications.

show abstract

Section: Related Workmentioning

confidence: 99%

RGB-D-Based Stair Detection and Estimation Using Deep Learning

Wang

Pei

Qiu

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

“…This is in itself a vast field of research, fueled in the last years by the great interest of the big technological actors and the advent of deep learning. YOLO [ 81 ] represented a big leap for object detection in 2D images, and 3D versions have been proposed [ 82 , 83 , 84 ]. Other approaches based on 3D descriptors [ 85 , 86 , 87 ] or other deep learning architectures [ 88 , 89 , 90 ] have also been the subject of research.…”

Section: Applicationsmentioning

confidence: 99%

RANSAC for Robotic Applications: A Survey

Martínez-Otzeta

Rodríguez-Moreno

Mendialdua

et al. 2022

Sensors

View full text Add to dashboard Cite

Random Sample Consensus, most commonly abbreviated as RANSAC, is a robust estimation method for the parameters of a model contaminated by a sizable percentage of outliers. In its simplest form, the process starts with a sampling of the minimum data needed to perform an estimation, followed by an evaluation of its adequacy, and further repetitions of this process until some stopping criterion is met. Multiple variants have been proposed in which this workflow is modified, typically tweaking one or several of these steps for improvements in computing time or the quality of the estimation of the parameters. RANSAC is widely applied in the field of robotics, for example, for finding geometric shapes (planes, cylinders, spheres, etc.) in cloud points or for estimating the best transformation between different camera views. In this paper, we present a review of the current state of the art of RANSAC family methods with a special interest in applications in robotics.

show abstract

“…As a consequence, they use 2D CNN methods and do not fully exploit the 3D data geometry of the objects, although they achieve good recognition performances, especially in the case of occlusions [38]. An alternative method for object recognition by a depth camera is to include the depth channel along RGB channels (RGB-D) in combination with a 2D CNN [39] and recursive neural networks (RNNs) [40] or encode the depth channel in jet color maps and the surface of normals [41]. Consequently, RGB-D methods for 3D recognition do not fully exploit 3D geometric information, but they reduce the hardware requirements compared to model-based methods and thus enable real-time applications, which is the intended use for contactless PW control.…”

Section: Related Workmentioning

confidence: 99%

Evaluation of 2D-/3D-Feet-Detection Methods for Semi-Autonomous Powered Wheelchair Navigation

et al. 2021

View full text Add to dashboard Cite

Powered wheelchairs have enhanced the mobility and quality of life of people with special needs. The next step in the development of powered wheelchairs is to incorporate sensors and electronic systems for new control applications and capabilities to improve their usability and the safety of their operation, such as obstacle avoidance or autonomous driving. However, autonomous powered wheelchairs require safe navigation in different environments and scenarios, making their development complex. In our research, we propose, instead, to develop contactless control for powered wheelchairs where the position of the caregiver is used as a control reference. Hence, we used a depth camera to recognize the caregiver and measure at the same time their relative distance from the powered wheelchair. In this paper, we compared two different approaches for real-time object recognition using a 3DHOG hand-crafted object descriptor based on a 3D extension of the histogram of oriented gradients (HOG) and a convolutional neural network based on YOLOv4-Tiny. To evaluate both approaches, we constructed Miun-Feet—a custom dataset of images of labeled caregiver’s feet in different scenarios, with backgrounds, objects, and lighting conditions. The experimental results showed that the YOLOv4-Tiny approach outperformed 3DHOG in all the analyzed cases. In addition, the results showed that the recognition accuracy was not improved using the depth channel, enabling the use of a monocular RGB camera only instead of a depth camera and reducing the computational cost and heat dissipation limitations. Hence, the paper proposes an additional method to compute the caregiver’s distance and angle from the Powered Wheelchair (PW) using only the RGB data. This work shows that it is feasible to use the location of the caregiver’s feet as a control signal for the control of a powered wheelchair and that it is possible to use a monocular RGB camera to compute their relative positions.

show abstract

Expandable YOLO: 3D Object Detection from RGB-D Images

Cited by 20 publications

References 9 publications

RGB-D-Based Stair Detection and Estimation Using Deep Learning

RGB-D-Based Stair Detection and Estimation Using Deep Learning

RANSAC for Robotic Applications: A Survey

Evaluation of 2D-/3D-Feet-Detection Methods for Semi-Autonomous Powered Wheelchair Navigation

Contact Info

Product

Resources

About