Deep learning for 3D shape classification from multiple depth maps

Zanuttigh, Pietro; Minto, Ludovico

doi:10.1109/icip.2017.8296956

Cited by 44 publications

(16 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More research work to exploit multi-view 3D data was carried out. Zanuttigh and Minto in [121] used a multi-branch CNN to classify different 3D objects. In this work, the input consists of a rendered depth maps from different point of views of the 3D object and five convolutional layers for each CNN branch to process each depth maps to produce a class file vector.…”

Section: B Performance Of Deep Learning Methods On Multi-view Datamentioning

confidence: 99%

A Review on Deep Learning Approaches for 3D Data Representations in Retrieval and Classifications

et al. 2020

View full text Add to dashboard Cite

Deep learning approach has been used extensively in image analysis tasks. However, implementing the methods in 3D data is a bit complex because most of the previously designed deep learning architectures used 1D or 2D as input. In this work, the performance of deep learning methods on different 3D data representations has been reviewed. Based on the categorization of the different 3D data representations proposed in this paper, the importance of choosing a suitable 3D data representation which depends on simplicity, usability, and efficiency has been highlighted. Furthermore, the origin and contents of the major 3D datasets were discussed in detail. Due to growing interest in 3D object retrieval and classification tasks, the performance of different 3D object retrieval and classification on ModelNet40 dataset were compared. According to the findings in this work, multi views methods surpass voxel-based methods and with increased layers and enough data augmentation the performance can still be increased. Therefore, it can be concluded that deep learning together with a suitable 3D data representation gives an effective approach for improving the performance of 3D shape analysis. Finally, some possible directions for future researches were suggested.

show abstract

Section: B Performance Of Deep Learning Methods On Multi-view Datamentioning

confidence: 99%

A Review on Deep Learning Approaches for 3D Data Representations in Retrieval and Classifications

et al. 2020

View full text Add to dashboard Cite

show abstract

“…• Our method is consistently competitive compared with other representative view-based and model-based methods for both 3D model retrieval and classification tasks, which demonstrates the superiority and efficiency of our proposed method. [9] 77.0% -3D-GAN [27] 83.3% -VSL [28] 84.5% -Shape-based binVoxNetPlus [29] 85.47% -PointNet [30] 89.2% kd-Networks [31] 91.8% -3D-A-Nets [32] 90.5% 80.1% G3DNet [13] 91.13% -PointNet++ [33] 91.9% -DeepPano [34] 77.6% 76.8% GIFT [17] 83.1% 81.9% Geometry Image [35] 83.9% 53.1% View-based Multiple Depth Maps [36] 87.8% -MVCNN [18] 90.1% 79.5% PANORAMA-NN [37] 90.7% 83.5% Pariwise [38] 90.7% -MVCNN-MultiRes [39] 91.4% -MVTS (Our) 93.4% 87.3% • Previous view-based methods usually just select one representative view from the view sequence of the model, or employ simple view-level aggregation strategy, like the max-pooling (eg. MVCNN) method to fuse multiple views.…”

Section: A Comparison With the State-of-the-art Methodsmentioning

confidence: 99%

Multi-View Tree Structure Learning for 3D Model Retrieval and Classification in Smart City

Liu

Zhao

et al. 2020

IEEE Access

View full text Add to dashboard Cite

The application of digital products in smart city results in ever-increasing 3D model data and how to obtain the relevant 3D model becomes a crucial issue. In this paper, we propose the Multi-View Tree Structure (MVTS) learning for 3D model retrieval and recognition. MVTS contains three key consecutive modules. Firstly, the visual feature learning module extracts the visual features of multiple views. Then, we design a score matrix to estimate the value of contextual information between view pairs. Based on the score matrix, a maximum spanning tree is constructed to further explore the contextual information within multiple views. Then, we utilize the bidirectional Tree-LSTM to encode the contextual information among views and the spatial information of tree structure and optimize the tree parameters. After that, the tree attention strategy is adopted to explore the importance of each view. Comparing to existing methods, our proposed method explores the spatial information of 3D model without the requirement of specific camera settings, which is more suitable for real applications. Moreover, our method jointly realizes the feature learning, view-wise contextual information and tree spatial information encoding and view importance estimating, which enhances the discrimination of the 3D model representation. Extensive experimental results on Modelnet40 and ShapeNetCore55 demonstrate the superiority of our method.

show abstract

“…The RGBD benchmark dataset [20] has two issues for training multiview based CNNs: insufficient number of object instances per category (which is a minimum of two for training) and inconsistent cases to the upright orientation assumption. There are several cases where the upright orientation assumption is actually invalid; the attitudes of object instances against the rotation axis are inconsistent in some [39] 95.0 92.4 -MVCNN-MultiRes [27] 93.8 91.4 -Dominant Set Clustering [40] 93.8 --Kd-Networks [17] 91.8 -94.0 VRN-single [4] 91.33 -93.61 FusionNet [14] 90.80 -93.11 Pairwise [16] 90.70 -92.80 PANORAMA-NN [32] 90.7 -91.1 DeepSets [44] 90.3 --MVCNN [36] 90.10 90.10 -ORION [31] 89.7 -93.80 PointNet [26] 89.2 86.2 -LightNet [47] 88.93 -93.94 FPNN [22] 88.4 --Multiple Depth Maps [45] 87.8 -91.5 ECC [34] 87.4 83.2 90.8 VoxNet [23] 85.9 83.0 -3DShapeNets [42] 84.7 77.3 -Geometry Image [35] 83. We captured each object instance with M e ¼ 10 levels of elevation angles and 16 levels of azimuth angles to obtain 160 images.…”

Section: Experiments On a 3d Rotated Real Image Datasetmentioning

confidence: 99%

RotationNet for Joint Object Categorization and Unsupervised Pose Estimation from Multi-View Images

Kanezaki

Matsushita

Nishida

2021

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

We propose a Convolutional Neural Network (CNN)-based model "RotationNet," which takes multi-view images of an object as input and jointly estimates its pose and object category. Unlike previous approaches that use known viewpoint labels for training, our method treats the viewpoint labels as latent variables, which are learned in an unsupervised manner during the training using an unaligned object dataset. RotationNet uses only a partial set of multi-view images for inference, and this property makes it useful in practical scenarios where only partial views are available. Moreover, our pose alignment strategy enables one to obtain view-specific feature representations shared across classes, which is important to maintain high accuracy in both object categorization and pose estimation. Effectiveness of RotationNet is demonstrated by its superior performance to the state-of-the-art methods of 3D object classification on 10-and 40-class ModelNet datasets. We also show that RotationNet, even trained without known poses, achieves comparable performance to the state-of-the-art methods on an object pose estimation dataset. Furthermore, our object ranking method based on classification by RotationNet achieved the first prize in two tracks of the 3D Shape Retrieval Contest (SHREC) 2017. Finally, we demonstrate the performance of real-world applications of RotationNet trained with our newly created multi-view image dataset using a moving USB camera.

show abstract

Deep learning for 3D shape classification from multiple depth maps

Cited by 44 publications

References 11 publications

A Review on Deep Learning Approaches for 3D Data Representations in Retrieval and Classifications

A Review on Deep Learning Approaches for 3D Data Representations in Retrieval and Classifications

Multi-View Tree Structure Learning for 3D Model Retrieval and Classification in Smart City

RotationNet for Joint Object Categorization and Unsupervised Pose Estimation from Multi-View Images

Contact Info

Product

Resources

About