2019
DOI: 10.3390/s19040866
|View full text |Cite
|
Sign up to set email alerts
|

Exploring RGB+Depth Fusion for Real-Time Object Detection

Abstract: In this paper, we investigate whether fusing depth information on top of normal RGB data for camera-based object detection can help to increase the performance of current state-of-the-art single-shot detection networks. Indeed, depth sensing is easily acquired using depth cameras such as a Kinect or stereo setups. We investigate the optimal manner to perform this sensor fusion with a special focus on lightweight single-pass convolutional neural network (CNN) architectures, enabling real-time processing on limi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 63 publications
(34 citation statements)
references
References 31 publications
0
34
0
Order By: Relevance
“…These groups have used a variety of neural network architectures. Some authors input explicit depth and imaging data on independent channels that are processed separately through several network layers [4], [18], [29], [34]. After some processing, the depth and radiance channels are fused.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…These groups have used a variety of neural network architectures. Some authors input explicit depth and imaging data on independent channels that are processed separately through several network layers [4], [18], [29], [34]. After some processing, the depth and radiance channels are fused.…”
Section: Related Workmentioning
confidence: 99%
“…This design raises the question of which layer is best for merging the independent channels; one might expect that the answer depends on both the network and the data. Using a YOLOv2 network, [29] explored how variations in the merged layer influenced performance. The portions of their analysis most relevant to our work detected vehicles using data obtained from the KITTI database.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Further, the authors introduce the 3D IoU with Volume of Overlap and Volume of Union for a better 3D bounding box proposal. Ophoff et al [44] propose a different approach. They use a separate network stream for the RGB and depth information each and fuse them by a concatenation layer.…”
Section: You Only Look Oncementioning
confidence: 99%
“…Some detection methods using RGB-D two-stream network have achieved a good result. Ophoff et al [ 29 ] explored the best fusion position of RGB and depth information in the CNN, from which they concluded that the best results can be obtained by feature fusion towards the mid to late layers. Gupta et al [ 30 ] proposed a depth map encoding method called HHA(horizontal disparity, height above ground, and angle with respect to gravity), which can encode the depth map into a three-channel image like RGB images.…”
Section: Introductionmentioning
confidence: 99%