In computer vision, fine-grained classification has become an important issue in recognizing objects with slight visual differences. Usually, it is challenging to generate good performance when solving fine-grained classification problems using traditional convolutional neural networks. To improve the accuracy and training time of convolutional neural networks in solving fine-grained classification problems, this paper proposes a tree-structured framework by eliminating the effect of differences between clusters. The contributions of the proposed method include the following three aspects: (1) a self-supervised method that automatically creates a classification tree, eliminating the need for manual labeling; (2) a machine-learning matcher which determines the cluster to which an item belongs, minimizing the impact of inter-cluster variations on classification; and (3) a pruning criterion which filters the tree-structured classifier, retaining only the models with superior classification performance. The experimental evaluation of the proposed tree-structured framework demonstrates its effectiveness in reducing training time and improving the accuracy of fine-grained classification across various datasets in comparison with conventional convolutional neural network models. Specifically, for the CUB 200 2011, FGVC aircraft, and Stanford car datasets, the proposed method achieves a reduction in training time of 32.91%, 35.87%, and 14.48%, and improves the accuracy of fine-grained classification by 1.17%, 2.01%, and 0.59%, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.