Canonical Correlation Analysis Regularization: An Effective Deep Multiview Learning Baseline for RGB-D Object Recognition

Tang, Lulu; Yang, Zhixin; Jia, Kui

doi:10.1109/tcds.2018.2866587

Cited by 30 publications

(21 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, as shown in Figure 3b–f, the direct method may deform the object’s original ratio and geometric structure, which will influence the recognition performance. So, we used the scaling processing method proposed in [33]. At first, we resized the origin image so that the length of its long side becomes 227 pixels.…”

Section: Proposed Methodsmentioning

confidence: 99%

“…The three streams include surface normal, color jet, and RGB channel. Tang et al proposed a canonical correlation analysis (CCA) based multi-view convolutional neural networks for RGB-D object recognition, which can effectively identify the associations between different perspectives of the same shaped model [33]. Zia et al proposed a hybrid 2D/3D convolutional neural network for RGB-D object recognition, which can be initialized with pretrained 2D CNN and can be trained over a relatively small RGB-D dataset [34].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

RGB-D Object Recognition Using Multi-Modal Deep Neural Network and DS Evidence Theory

Zeng

Yang

Wang

et al. 2019

Sensors

View full text Add to dashboard Cite

With the development of low-cost RGB-D (Red Green Blue-Depth) sensors, RGB-D object recognition has attracted more and more researchers’ attention in recent years. The deep learning technique has become popular in the field of image analysis and has achieved competitive results. To make full use of the effective identification information in the RGB and depth images, we propose a multi-modal deep neural network and a DS (Dempster Shafer) evidence theory based RGB-D object recognition method. First, the RGB and depth images are preprocessed and two convolutional neural networks are trained, respectively. Next, we perform multi-modal feature learning using the proposed quadruplet samples based objective function to fine-tune the network parameters. Then, two probability classification results are obtained using two sigmoid SVMs (Support Vector Machines) with the learned RGB and depth features. Finally, the DS evidence theory based decision fusion method is used for integrating the two classification results. Compared with other RGB-D object recognition methods, our proposed method adopts two fusion strategies: Multi-modal feature learning and DS decision fusion. Both the discriminative information of each modality and the correlation information between the two modalities are exploited. Extensive experimental results have validated the effectiveness of the proposed method.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

RGB-D Object Recognition Using Multi-Modal Deep Neural Network and DS Evidence Theory

Zeng

Yang

Wang

et al. 2019

Sensors

View full text Add to dashboard Cite

show abstract

“…They used a fusion saliency map of objects and a centered darker channel for object segmentation, multiple feature descriptors, feature matching, and Hough voting for the recognition of multiple objects over the RGB-D dataset. L. Tang et al [20] designed a convolution neural network framework based on canonical correlation analysis (CCA). They fused separately processed RGB and depth images through a CCA layer and a combining layer was introduced to the multi-view CNN.…”

Section: Sustainable Multi-objects Recognition Via Depth Imagesmentioning

confidence: 99%

Automated Sustainable Multi-Object Segmentation and Recognition via Modified Sampling Consensus and Kernel Sliding Perceptron

2020

View full text Add to dashboard Cite

Object recognition in depth images is challenging and persistent task in machine vision, robotics, and automation of sustainability. Object recognition tasks are a challenging part of various multimedia technologies for video surveillance, human–computer interaction, robotic navigation, drone targeting, tourist guidance, and medical diagnostics. However, the symmetry that exists in real-world objects plays a significant role in perception and recognition of objects in both humans and machines. With advances in depth sensor technology, numerous researchers have recently proposed RGB-D object recognition techniques. In this paper, we introduce a sustainable object recognition framework that is consistent despite any change in the environment, and can recognize and analyze RGB-D objects in complex indoor scenarios. Firstly, after acquiring a depth image, the point cloud and the depth maps are extracted to obtain the planes. Then, the plane fitting model and the proposed modified maximum likelihood estimation sampling consensus (MMLESAC) are applied as a segmentation process. Then, depth kernel descriptors (DKDES) over segmented objects are computed for single and multiple object scenarios separately. These DKDES are subsequently carried forward to isometric mapping (IsoMap) for feature space reduction. Finally, the reduced feature vector is forwarded to a kernel sliding perceptron (KSP) for the recognition of objects. Three datasets are used to evaluate four different experiments by employing a cross-validation scheme to validate the proposed model. The experimental results over RGB-D object, RGB-D scene, and NYUDv1 datasets demonstrate overall accuracies of 92.2%, 88.5%, and 90.5% respectively. These results outperform existing state-of-the-art methods and verify the suitability of the method.

show abstract

“…The model contains three parts: deep networks (input layer, hidden layers, and output layer), feature concatenation and softmax classifier. The concatenation occurs at a higher layer instead of the input layer since concatenation at the input layer often causes 1) intractable training effort; 2) over-fitting due to prematurely learned features from both modalities; and 3) failure to learn implicit associations between modalities with different underlying features [48]. This model first learns the two modalities separately with two different flows, and then concatenate their features at a higher layer.…”

Section: The Multi-modal Model With Simple Concatenationmentioning

confidence: 99%

Deep learning for video game genre classification

Jiang¹,

Zheng²

2020

Preprint

View full text Add to dashboard Cite

Video game covers and textual descriptions are usually the very first impression to its consumers and they often convey important information about the video games. Video game genre classification based on its cover and textual description would be utterly beneficial to many modern identification, collocation, and retrieval systems. At the same time, it is also an extremely challenging task due to the following reasons: First, there exists a wide variety of video game genres, many of which are not concretely defined. Second, video game covers vary in many different ways such as colors, styles, textual information, etc, even for games of the same genre. Third, cover designs and textual descriptions may vary due to many external factors such as country, culture, target reader populations, etc. With the growing competitiveness in the video game industry, the cover designers and typographers push the cover designs to its limit in the hope of attracting sales. The computer-based automatic video game genre classification systems become a particularly exciting research topic in recent years.In this paper, we propose a multi-modal deep learning framework to solve this problem. The contribution of this paper is four-fold. First, we compiles a large dataset consisting of 50,000 video games from 21 genres made of cover images, description text, and title text and the genre information. Second, image-based and text-based, state-of-the-art models are evaluated thoroughly for the task of genre classification for video games. Third, we developed an efficient and

show abstract

Canonical Correlation Analysis Regularization: An Effective Deep Multiview Learning Baseline for RGB-D Object Recognition

Cited by 30 publications

References 35 publications

RGB-D Object Recognition Using Multi-Modal Deep Neural Network and DS Evidence Theory

RGB-D Object Recognition Using Multi-Modal Deep Neural Network and DS Evidence Theory

Automated Sustainable Multi-Object Segmentation and Recognition via Modified Sampling Consensus and Kernel Sliding Perceptron

Deep learning for video game genre classification

Contact Info

Product

Resources

About