2019
DOI: 10.1109/access.2019.2907071
|View full text |Cite
|
Sign up to set email alerts
|

RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey

Abstract: Object recognition in real-world environments is one of the fundamental and key tasks in computer vision and robotics communities. With the advanced sensing technologies and low-cost depth sensors, the high-quality RGB and depth images can be recorded synchronously, and the object recognition performance can be improved by jointly exploiting them. RGB-D-based object recognition has evolved from early methods that using hand-crafted representations to the current state-of-the-art deep learning-based methods. Wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 50 publications
(25 citation statements)
references
References 183 publications
(279 reference statements)
0
25
0
Order By: Relevance
“…To perform image classification directly, the authors of [217,218] suggested the possibility of using multi-stream CNNs (i.e., two or more stream CNNs) to extract robust features from a final hidden layer and then project them onto a common representation space. However, the most commonly adopted approaches involve concatenating a set of pre-trained features derived from the huge ImageNet dataset to generate a multimodal representation [216].…”
Section: Convolutional Neural Network Basedmentioning
confidence: 99%
“…To perform image classification directly, the authors of [217,218] suggested the possibility of using multi-stream CNNs (i.e., two or more stream CNNs) to extract robust features from a final hidden layer and then project them onto a common representation space. However, the most commonly adopted approaches involve concatenating a set of pre-trained features derived from the huge ImageNet dataset to generate a multimodal representation [216].…”
Section: Convolutional Neural Network Basedmentioning
confidence: 99%
“…Negin et al 28 used the CNN model to make a simple exploration of static gesture recognition. Inspired by Wang's work, Gao et al 29 proposed a recognition multimodal feature learning method for RGB‐D scene recognition.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast, the 3D camera not only obtains the plane feature information of the object but also acquires the spatial position information. As a result, the unknown object recognition by 3D cameras has become a research hotspot in recent years [ 106 ]. Compared with passive sensors, such as cameras, lidars, etc., the tactile sensor inspired by human touch acquires object feature information by active contact perception.…”
Section: Unknown Objectsmentioning
confidence: 99%