2020
DOI: 10.48550/arxiv.2003.03836
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches

Abstract: Fine-grained visual classication (FGVC) is much more challenging than traditional classication tasks due to the inherently subtle intra-class object variations. Recent works mainly tackle this problem by focusing on how to locate the most discriminative parts, more complementary parts, and parts of various granularities. However, less effort has been placed to which granularities are the most discriminative and how to fuse information cross multi-granularity. In this work, we propose a novel framework for fine… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 37 publications
0
5
0
Order By: Relevance
“…For ImageNet (Deng et al 2009), ResNet-50 and ResNet-200 (He et al 2016) are adopted as target models, which are trained from scratch. For fine-grained image classification, we follow the previous works (Du et al 2020;Chen et al 2019b) and utilize pre-trained ResNet-50 and ResNet-101 models as the target network. Unless specified otherwise, the input image size is 32×32 for CIFAR while 224×224 for ImageNet, CUB-200-2011 and Stanford Dogs.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…For ImageNet (Deng et al 2009), ResNet-50 and ResNet-200 (He et al 2016) are adopted as target models, which are trained from scratch. For fine-grained image classification, we follow the previous works (Du et al 2020;Chen et al 2019b) and utilize pre-trained ResNet-50 and ResNet-101 models as the target network. Unless specified otherwise, the input image size is 32×32 for CIFAR while 224×224 for ImageNet, CUB-200-2011 and Stanford Dogs.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…Backbone IDRiD DeepDR B-CNN (ICCV15) [18] VGG16 0.8631 0.8702 HBP (ECCV18) [4] VGG16 0.8511 0.8586 DFL (CVPR18) [19] ResNet50 0.8804 0.8926 PMG (CVPR20) [20] ResNet50 0.8694 0.8825 AG Net (MIA20) [21] AlexNet+GoogleNet 0.8573 0.8644 Our Method DSOD 0.8874 0.9050 method achieves superior performance. This shows the effectiveness of the proposed approach.…”
Section: Methodsmentioning
confidence: 99%
“…In this section, we compare the performance of the proposed method with the state-of-the-art fine-grained classification methods, such as B-CNN [18], HBP [4], DFL [19], and PMG [20]. Meanwhile, the proposed method is also compared with the ensembling method [21] which is the fusion of AlexNet and GoogleNet.…”
Section: Comparison With the State-of-the-artmentioning
confidence: 99%
“…For example, ImageNet is used in [8] for joint learning (Flowers: 97.7%), and pretrained sub-networks or even higher image resolution (e.g. 448×448 in PMG [17], 480×480 in GPipe [29], MC [8], etc.) is considered in improving FGVC accuracy.…”
Section: Fine-grained Image Classificationmentioning
confidence: 99%