Material images are susceptible to changes, depending on the light intensity, visual angle, shooting distance, and other conditions. Feature learning has shown great potential for addressing this issue. However, the knowledge achieved using a simple feature fusion method is insufficient to fully represent the material images. In this study, we aimed to exploit the diverse knowledge learned by a novel progressive feature fusion method to improve the recognition performance. To obtain implicit cross-modal knowledge, we perform early feature fusion and capture the cluster canonical correlations among the state-of-the-art (SOTA) heterogeneous squeeze-and-excitation network (SENet) features. A set of more discriminative deep-level visual semantics (DVSs) is obtained. We then perform gene selection-based middle feature fusion to thoroughly exploit the feature-shared knowledge among the generated DVSs. Finally, any type of general classifier can use the feature-shared knowledge to perform the final material recognition. Experimental results on two public datasets (Fabric and MattrSet) showed that our method outperformed other SOTA baseline methods in terms of accuracy and real-time efficiency. Even most traditional classifiers were able to obtain a satisfactory performance using our method, thus demonstrating its high practicality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.