The emergence of an RGB-D (Red-Green-Blue-Depth)
Keywords: RGB-D images, local descriptor, object recognition, depth images
IntroductionObject recognition is an important problem in computer science, which has attracted the interest of researchers in the fields of computer vision, machine learning and robotics [1]. The core of building object recognition systems is to extract meaningful representations (features) from high-dimensional observations such as images, videos and 3D point clouds [2]. Satisfactory results have been achieved by using a variety of methods, applications and standard benchmark datasets. Nevertheless, object recognition of daily objects in a scene image is still an open problem. The major challenges in a visual object recognition system are divided into two groups, which are related to system robustness and computational complexity and scalability. Belong to the first group is the challenge in handling intra-class variations in appearance (different appearance from a number of objects of the same category) and interclass variations. Instances of the same object category can generate different images caused by a variety of variables that influence illumination, object pose, camera viewpoint, partial occlusion and background clutter. While the challenges belonging to the second group include very large objects of different categories, high-dimensional descriptors and difficulties in obtaining labelled training samples without any ambiguity etc. [3].To address these two challenges, [3] argues that there are three aspects involved, namely modelling appearance, localization strategies and supervised classification. The focus of the researchers was trying to develop techniques and algorithms in those three aspects in order to improve the visual object recognition system performance. Among these three aspects, modelling appearance is the most important aspect [3]. Appearance modelling is focused on the selection of features that can handle various types of intra-class variations and can capture the discriminative aspects of the different categories. Furthermore, [4] also stated that "the next step in the evolution of object recognition algorithm will require radical and bold steps forward in terms of the object representations, as well as the learning and inference algorithm used".The emergence of the RGB-D sensor (Microsoft Kinect, Asus Xtion, and PrimeSense), which is relatively cheap, promises to improve performance in object recognition. The sensor is capable of providing a depth image for each pixel so that the image information is abundant. RGB-D sensor has an RGB camera and an infrared camera and projector, so it can capture colour images and the depth of each pixel in the image. These two factors are very helpful for the image processing field that was always dependent on the colour channels of the image [5], [6]. By using the depth channel for foreground segmentation or complementary information on