The rapid development of 3D technique has led to the dramatic increase in 3D data. The scalable and effective 3D object retrieval and classification algorithms become mandatory for large-scale 3D object management. One critical problem of view-based 3D object retrieval and classification is how to exploit the relevance and discrimination among multiple views. In this paper, we propose a multi-view hierarchical fusion network (MVHFN) for these two tasks. This method mainly contains two key modules. First, the module of visual feature learning applies the 2D CNNs to extract the visual feature of multiple views rendered around the specific 3D object. Then, the multi-view hierarchical fusion module we proposed is employed to fuse the multiple view features into a compact descriptor. This module can not only fully exploit the relevance among multiple views by intra-cluster multi-view fusion mechanism, but also discover the content discrimination by inter-cluster multi-view fusion mechanism. Experimental results on two public datasets, i.e., ModelNet40 and ShapeNetCore55, show that our proposed MVHFN outperforms the current state-of-the-art methods in both the 3D object retrieval and classification tasks. INDEX TERMS 3D object retrieval, 3D object classification, 3D shape recognition, multi-view.
In recent years, with the rapid development of 3D technology, view-based methods have shown excellent performance in both 3D model classification and retrieval tasks. In view-based methods, how to aggregate multi-view features is a key issue. There are two commonly used solutions in the existing methods: 1) Use pooling strategy to merge multi-view features, but it ignores the context information contained in the continuous view sequence. 2) Leverage grouping strategy or long short term memory networks (LSTM) to select representative views of the 3D model, however, it easily neglects the semantic information of individual views. In this paper, we propose a novel Semantic and Context information Fusion Network (SCFN) to compensate for these drawbacks. First, we render views from multiple perspectives of the 3D model and extract the raw feature of the individual view by 2D convolutional neural networks (CNN). Then we design the channel attention mechanism (CAM) to exploit the view-wise semantic information. By modeling the correlation among view feature channels, we can assign higher weights to useful feature attributes, while suppressing the useless. Next, we propose a context information fusion module (CFM) to fuse multiple view features to obtain a compact 3D representation. Extensive experiments are conducted on three popular datasets, i.e., ModelNet10, ModelNet40, and ShapeNetCore55, which can demonstrate the superiority of the proposed method comparing to the state-of-the-arts on both 3D classification and retrieval tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.