Material recognition is a fundamental problem in the field of computer vision. Material recognition is still challenging because of varying camera perspectives, light conditions, and illuminations. Feature learning or feature engineering helps build an important foundation for effective material recognition. Most traditional and deep learning-based features usually point to the same or similar material semantics from diverse visual perspectives, indicating the implicit complementary information (or crossmodal semantics) among these heterogeneous features. However, only a few studies focus on mining the cross-modal semantics among heterogeneous image features, which can be used to boost the final recognition performance. To address this issue, we first improve the well-known multiset discriminant correlation analysis model to fully mine the cross-modal semantics among heterogeneous image features. Then, we propose a novel hierarchical multi-feature fusion (HMF 2) model to gather effective information and create novel yet more effective and robust features. Finally, a general classifier is employed to train a new material recognition model. Experimental results demonstrate the simplicity, effectiveness, robustness, and efficiency of the HMF 2 model on two benchmark datasets. Furthermore, based on the HMF 2 model, we design an end-to-end online system for real-time material recognition.
Training an effective image sentiment analysis model using high-quality samples and the implicit cross-modal semantics among heterogeneous features is still challenging. To address this problem, we propose an active sample refinement (ASR) strategy to obtain sufficient high-quality images with definite sentiment semantics. We mine the cluster correlation among the heterogeneous SENet features. Discriminative cross-modal semantics is generated to train an effective but robust image classifier. Ensemble learning is employed to further boost performance. Our method outperforms other competitive baselines, demonstrating its effectiveness and robustness. Meanwhile, the ASR strategy is a useful supplement to the current data augmentation method.
Material images are susceptible to changes, depending on the light intensity, visual angle, shooting distance, and other conditions. Feature learning has shown great potential for addressing this issue. However, the knowledge achieved using a simple feature fusion method is insufficient to fully represent the material images. In this study, we aimed to exploit the diverse knowledge learned by a novel progressive feature fusion method to improve the recognition performance. To obtain implicit cross-modal knowledge, we perform early feature fusion and capture the cluster canonical correlations among the state-of-the-art (SOTA) heterogeneous squeeze-and-excitation network (SENet) features. A set of more discriminative deep-level visual semantics (DVSs) is obtained. We then perform gene selection-based middle feature fusion to thoroughly exploit the feature-shared knowledge among the generated DVSs. Finally, any type of general classifier can use the feature-shared knowledge to perform the final material recognition. Experimental results on two public datasets (Fabric and MattrSet) showed that our method outperformed other SOTA baseline methods in terms of accuracy and real-time efficiency. Even most traditional classifiers were able to obtain a satisfactory performance using our method, thus demonstrating its high practicality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.