2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00953
|View full text |Cite
|
Sign up to set email alerts
|

Deformable ConvNets V2: More Deformable, Better Results

Abstract: The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformabl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
1,105
1
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 1,926 publications
(1,113 citation statements)
references
References 35 publications
6
1,105
1
1
Order By: Relevance
“…The deformable convolutions help with better feature sampling by aligning the sampling positions with the instances of interest and better handles changes in scale, rotation, and aspect ratio. Importantly, with our exploration of using less deformable convolution layers, we can cut down their speed overhead significantly (from 8 ms to 2.8 ms) while keeping the performance almost the same (only 0.2 mAP drop) as compared to the original configuration proposed in [13]; see Table 7. With these two upgrades for object detection, YOLACT++ suffers less from localization failure and has finer mask predictions, as shown in Figure 10b, c, which together result in 3.4 mAP and 4.2 mAP boost for ResNet-101 and ResNet-50, respectively.…”
Section: Box Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…The deformable convolutions help with better feature sampling by aligning the sampling positions with the instances of interest and better handles changes in scale, rotation, and aspect ratio. Importantly, with our exploration of using less deformable convolution layers, we can cut down their speed overhead significantly (from 8 ms to 2.8 ms) while keeping the performance almost the same (only 0.2 mAP drop) as compared to the original configuration proposed in [13]; see Table 7. With these two upgrades for object detection, YOLACT++ suffers less from localization failure and has finer mask predictions, as shown in Figure 10b, c, which together result in 3.4 mAP and 4.2 mAP boost for ResNet-101 and ResNet-50, respectively.…”
Section: Box Resultsmentioning
confidence: 99%
“…Understanding the AP Gap However, localization failure and leakage alone are not enough to explain the almost 6 mAP gap between YOLACT's base model and, say, Mask R-CNN. Indeed, our base model on COCO has just a 2.5 mAP difference between its test-dev mask and box mAP (29.8 mask, 32.3 box), meaning our base model would only gain a few points of mAP even [13] in YOLACT. Results on MS COCO val2017.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations