Standard convolution operations can effectively perform feature extraction and representation but result in high computational cost, largely due to the generation of the original convolution kernel corresponding to the channel dimension of the feature map, which will cause unnecessary redundancy. In this paper, we focus on kernel generation and present an interpretable span strategy, named SpanConv, for the effective construction of kernel space. Specifically, we first learn two navigated kernels with single channel as bases, then extend the two kernels by learnable coefficients, and finally span the two sets of kernels by their linear combination to construct the so-called SpanKernel. The proposed SpanConv is realized by replacing plain convolution kernel by SpanKernel. To verify the effectiveness of SpanConv, we design a simple network with SpanConv. Experiments demonstrate the proposed network significantly reduces parameters comparing with benchmark networks for remote sensing pansharpening, while achieving competitive performance and excellent generalization. Code is available at https://github.com/zhi-xuan-chen/IJCAI-2022 SpanConv.
Pansharpening is a critical yet challenging low-level vision task that aims to obtain a higher-resolution image by fusing a multispectral (MS) image and a panchromatic (PAN) image. While most pansharpening methods are based on convolutional neural network (CNN) architectures with standard convolution operations, few attempts have been made with context-adaptive/dynamic convolution, which delivers impressive results on high-level vision tasks. In this paper, we propose a novel strategy to generate local-context adaptive (LCA) convolution kernels and introduce a new global harmonic (GH) bias mechanism, exploiting image local specificity as well as integrating global information, dubbed LAGConv. The proposed LAGConv can replace the standard convolution that is context-agnostic to fully perceive the particularity of each pixel for the task of remote sensing pansharpening. Furthermore, by applying the LAGConv, we provide an image fusion network architecture, which is more effective than conventional CNN-based pansharpening approaches. The superiority of the proposed method is demonstrated by extensive experiments implemented on a wide range of datasets compared with state-of-the-art pansharpening methods. Besides, more discussions testify that the proposed LAGConv outperforms recent adaptive convolution techniques for pansharpening.
Despite the success of vision Transformers for the image deraining task, they are limited by computation-heavy and slow runtime. In this work, we investigate Transformer decoder is not necessary and has huge computational costs. Therefore, we revisit the standard vision Transformer as well as its successful variants and propose a novel Decoder-Free Transformer-Like (DFTL) architecture for fast and accurate single image deraining. Specifically, we adopt a cheap linear projection to represent visual information with lower
computational costs than previous linear projections. Then we replace standard Transformer decoder block with designed Progressive Patch Merging (PPM), which attains comparable performance and efficiency. DFTL could significantly alleviate the computation and GPU memory requirements through proposed modules. Extensive experiments
demonstrate the superiority of DFTL compared with competitive Transformer architectures, e.g., ViT, DETR, IPT, Uformer, and Restormer. The code is available at https://github.com/XiaoXiao-Woo/derain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.