Deformable ConvNets V2: More Deformable, Better Results

Zhu, Xizhou; Hu, Han; Lin, Stephen; Dai, Jifeng

doi:10.1109/cvpr.2019.00953

Cited by 1,926 publications

(1,113 citation statements)

References 35 publications

Supporting

Mentioning

1,105

Contrasting

Unclassified

Order By: Relevance

“…The deformable convolutions help with better feature sampling by aligning the sampling positions with the instances of interest and better handles changes in scale, rotation, and aspect ratio. Importantly, with our exploration of using less deformable convolution layers, we can cut down their speed overhead significantly (from 8 ms to 2.8 ms) while keeping the performance almost the same (only 0.2 mAP drop) as compared to the original configuration proposed in [13]; see Table 7. With these two upgrades for object detection, YOLACT++ suffers less from localization failure and has finer mask predictions, as shown in Figure 10b, c, which together result in 3.4 mAP and 4.2 mAP boost for ResNet-101 and ResNet-50, respectively.…”

Section: Box Resultsmentioning

confidence: 99%

“…Understanding the AP Gap However, localization failure and leakage alone are not enough to explain the almost 6 mAP gap between YOLACT's base model and, say, Mask R-CNN. Indeed, our base model on COCO has just a 2.5 mAP difference between its test-dev mask and box mAP (29.8 mask, 32.3 box), meaning our base model would only gain a few points of mAP even [13] in YOLACT. Results on MS COCO val2017.…”

Section: Discussionmentioning

confidence: 99%

“…Deformable Convolution Networks (DCNs) [12], [13] have proven to be effective for object detection, semantic segmentation, and instance segmentation due to its replacement of the rigid grid sampling used in conventional convnets with free-form sampling. We follow the design choice made by DCNv2 [13] and replace the 3x3 convolution layer in each ResNet block with a 3x3 deformable convolution layer for C 3 to C 5 . Note that we do not use the modulated deformable modules because we can't afford the inference time overhead that they introduce.…”

Section: Deformable Convolution With Intervalsmentioning

confidence: 99%

“…Even though the performance boost is fairly decent when directly plugging in the deformable convolution layers following the design choice in [13], the speed overhead is quite significant as well (see Table 7). This is because there are 30 layers with deformable convolutions when using ResNet-101.…”

Section: Deformable Convolution With Intervalsmentioning

confidence: 99%

“…To further improve the performance of our model over our conference paper version [11], in Section 6, we propose YOLACT++. Specifically, we incorporate deformable convolutions [12], [13] into the backbone network, which provide more flexible feature sampling and strengthening its capability of handling instances with different scales, aspect ratios, and rotations. Furthermore, we optimize the prediction heads with better anchor scale and aspect ratio choices for larger object recall.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

YOLACT: Real-Time Instance Segmentation

Bolya

Zhou

Xiao

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

1,678

881

View full text Add to dashboard Cite

We present a simple, fully-convolutional model for real-time (> 30 fps) instance segmentation that achieves competitive results on MS COCO evaluated on a single Titan Xp, which is significantly faster than any previous state-of-the-art approach. Moreover, we obtain this result after training on only one GPU. We accomplish this by breaking instance segmentation into two parallel subtasks: (1) generating a set of prototype masks and (2) predicting per-instance mask coefficients. Then we produce instance masks by linearly combining the prototypes with the mask coefficients. We find that because this process doesn't depend on repooling, this approach produces very high-quality masks and exhibits temporal stability for free. Furthermore, we analyze the emergent behavior of our prototypes and show they learn to localize instances on their own in a translation variant manner, despite being fully-convolutional. We also propose Fast NMS, a drop-in 12 ms faster replacement for standard NMS that only has a marginal performance penalty. Finally, by incorporating deformable convolutions into the backbone network, optimizing the prediction head with better anchor scales and aspect ratios, and adding a novel fast mask re-scoring branch, our YOLACT++ model can achieve 34.1 mAP on MS COCO at 33.5 fps, which is fairly close to the state-of-the-art approaches while still running at real-time.

show abstract

Section: Box Resultsmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Deformable Convolution With Intervalsmentioning

confidence: 99%

Section: Deformable Convolution With Intervalsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

YOLACT: Real-Time Instance Segmentation

Bolya

Zhou

Xiao

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

1,678

881

View full text Add to dashboard Cite

show abstract

Novel activation function with pixelwise modeling capacity for lightweight neural network design

Liu

Guo

Tan

et al. 2021

Concurrency and Computation

View full text Add to dashboard Cite

The development of lightweight networks makes neural networks more efficient to be widely applied to various tasks. Considering the deployment of hardware like edge devices and mobile phones, we prioritize lightweight networks. However, their accuracy has always lagged far behind SOTA networks. In this article, we present a simple yet effective activation function, called WReLU, to improve the performance of lightweight networks significantly by adding a residual spatial condition. Moreover, we use a strategy to switch activation functions after determining which convolutional layer to use. We perform experiments on ImageNet 2012 classification dataset in CPU, GPU, and edge devices. Experiments demonstrate that WReLU improves the accuracy of classification significantly. Meanwhile, our strategy balances the effect of additional parameters and multiply accumulate. Our method improves the accuracy of SqueezeNet and SqueezeNext by more than 5% without increasing extensive parameters and computation. For the lightweight network with a large number of parameters, such as MobileNet and ShuffleNet, there is also a significant improvement.Additionally, the inference speed of most lightweight networks using our WReLU strategy is almost the same as the baseline model on different platforms. Our approach not only ensures the practicability of the lightweight network but also improves its performance.

show abstract

AAEE‐Net: Attention‐guided aggregation and error‐aware enhancement network for accurate and efficient stereo matching

Liu

Zhang

et al. 2023

Concurrency and Computation

View full text Add to dashboard Cite

Stereo matching is a fundamental and long‐standing task in computer vision. Although learning‐based stereo matching algorithms have made remarkable progress, two major challenges still persist. Firstly, existing cost aggregation methods that use stacked three‐dimensional convolutions are complex, leading to heavy computation and memory costs. Secondly these methods continue to struggle with establishing reliable matches in weakly matchable such as that edges and thin structures. To overcome these limitations, we propose an accurate and efficient network called Attention‐guided Aggregation and Error‐aware Enhancement Network (AAEE‐Net). Our approach involves designing an Attention‐guided Aggregation Mechanism (AAM) based on simple image features. This mechanism uses attention weights generated from image features to guide cost aggregation with a more efficient and effective strategy. Additionally, we propose an Error‐aware Enhancement Module (EEM) that refines the raw disparity by combining high‐frequency information from the original image and warp error between the left and right views. EEM enables the network to learn error correction capabilities that produce excellent subtle details and sharp edges. The experimental results on the SceneFlow and KITTI benchmark datasets demonstrate that AAEE‐Net achieves state‐of‐the‐art performance with low inference time. The qualitative results show that AAEE‐Net significantly improves predictions, especially for thin structures.

show abstract

Deformable ConvNets V2: More Deformable, Better Results

Cited by 1,926 publications

References 35 publications

YOLACT: Real-Time Instance Segmentation

YOLACT: Real-Time Instance Segmentation

Novel activation function with pixelwise modeling capacity for lightweight neural network design

AAEE‐Net: Attention‐guided aggregation and error‐aware enhancement network for accurate and efficient stereo matching

Contact Info

Product

Resources

About