Efficient Coarse-to-Fine Non-Local Module for the Detection of Small Objects

Levi, Hila; Ullman, Shimon

doi:10.48550/arxiv.1811.12152

Cited by 3 publications

(9 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Comparing with the standard NL block, the proposed LightNL block is about 400× computationally cheaper (6.2G vs. 15M) with comparable performance (75.2% vs. 75.0%). Comparing with Levi et al[22] which optimized the matrix multiplication with the associative law, the proposed LightNL block is still 10× computationally cheaper. Compared with a very recent work proposed by Zhu et al[49] which leverages the pyramid pooling to reduce the complexity, LightNL is around 7× computationally cheaper.…”

mentioning

confidence: 90%

“…Before matrix multiplications, the outputs of 1 × 1 convolution are reshaped to (H × W, C). Levi et al [22] discover that for NL blocks instantiated in the form of Eqn. (3), employing the associative law of matrix multiplication can largely reduce the computation overhead.…”

Section: Lightweight Non-local Blocksmentioning

confidence: 99%

“…Although the two forms produce the same numerical results, they have different computational complexity [22]. Therefore in computing Eqn.…”

Section: Lightweight Non-local Blocksmentioning

confidence: 99%

“…Several efforts have been explored to reduce the computation overhead. For instance, [7,22] use associative law to reduce the memory and computation cost of matrix multiplication; Yue et al [44] use Taylor expansion to optimize the non-local module; Cao et al [4] compute the affinity matrix via a convolutional layer; Bello et al [2] design a novel attention-augmented convolution. However, these methods either still lead to relatively large computation overhead (via using heavy operators, such as large matrix multiplications) or result in a less accurate outcome (e.g., simplified NL blocks [4]), making these methods undesirable for mobile-level vision systems.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Neural Architecture Search for Lightweight Non-Local Networks

Jin

Mei

et al. 2020

Preprint

View full text Add to dashboard Cite

Non-Local (NL) blocks have been widely studied in various vision tasks. However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks. We propose AutoNL to overcome the above two obstacles. Firstly, we propose a Lightweight Non-Local (LightNL) block by squeezing the transformation operations and incorporating compact features. With the novel design choices, the proposed LightNL block is 400× computationally cheaper than its conventional counterpart without sacrificing the performance. Secondly, by relaxing the structure of the LightNL block to be differentiable during training, we propose an efficient neural architecture search algorithm to learn an optimal configuration of LightNL blocks in an end-to-end manner. Notably, using only 32 GPU hours, the searched AutoNL model achieves 77.7% top-1 accuracy on ImageNet under a typical mobile setting (350M FLOPs), significantly outperforming previous mobile models including MobileNetV2 (+5.7%), FBNet (+2.8%) and MnasNet (+2.1%). Code and models are available at https://github.com/LiYingwei/AutoNL.

show abstract

mentioning

confidence: 90%

Section: Lightweight Non-local Blocksmentioning

confidence: 99%

“…Although the two forms produce the same numerical results, they have different computational complexity [22]. Therefore in computing Eqn.…”

Section: Lightweight Non-local Blocksmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Neural Architecture Search for Lightweight Non-Local Networks

Jin

Mei

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Computational efficiency There are multiple ways to control the computational cost of deep neural networks. We categorize them into four groups: i) compression methods that aim to remove redundancy from already trained models [34]; ii) lightweight design strategies used to replace network components with computationally lighter counterparts [35]; iii) partial computation methods selectively utilize units of a network, thus creating forward-propagation paths with different computational costs [36]; and iv) attention mechanisms that can be used to selectively process subsets of the input, based on their importance for the task of interest [8,37,38]. The latter being the strategy we consider in the proposed Zoom-In architecture.…”

Section: Tiny Object Classificationmentioning

confidence: 99%

Efficient Classification of Very Large Images with Tiny Objects

Kong¹,

Henao

2021

Preprint

View full text Add to dashboard Cite

An increasing number of applications in the computer vision domain, specially, in medical imaging and remote sensing, are challenging when the goal is to classify very large images with tiny objects. More specifically, these type of classification tasks face two key challenges: i) the size of the input image in the target dataset is usually in the order of megapixels, however, existing deep architectures do not easily operate on such big images due to memory constraints, consequently, we seek a memory-efficient method to process these images; and ii) only a small fraction of the input images are informative of the label of interest, resulting in low region of interest (ROI) to image ratio. However, most of the current convolutional neural networks (CNNs) are designed for image classification datasets that have relatively large ROIs and small image size (sub-megapixel). Existing approaches have addressed these two challenges in isolation. We present an end-to-end CNN model termed Zoom-In network that leverages hierarchical attention sampling for classification of large images with tiny objects using a single GPU. We evaluate our method on two large-image datasets and one gigapixel dataset. Experimental results show that our model achieves higher accuracy than existing methods while requiring less computing resources. IntroductionNeural networks have achieved state-of-the-art performance in many image classification tasks [1]. However, there are still many scenarios where neural networks can still be improved. Using modern deep neural networks on image inputs of very high resolution is a non-trivial problem due to the challenges of scaling model architectures [2]. Such images are common for instance in satellite or medical imaging. Moreover, these images tend to become even bigger due to the rapid growth in computational and memory availability, as well as the advancements in camera sensor technology. Specifically challenging are the so called tiny object image classification tasks, where the goal is to classify images based on the information of very small objects or regions of interest (ROIs), in the presence of a much larger background that is uncorrelated or non-informative of the label. Consequently, constituting an input image with a very low ROI-to-image ratio.Recent work [3] showed that with a dataset of limited size, convolutional neural networks (CNNs) have poor performance on very low ROI-to-image ratio problems. In these settings, the input resolution is increased from typical image sizes, e.g., 224 × 224 pixels, to gigapixel images of size ranging from 45, 056×35, 840 to 217, 088×111, 104 pixels [4], which not only requires significantly more computational processing power per image than a typical image given a fixed deep architecture, but in some cases, it becomes prohibitive for current GPU-memory standards. Figure 1 shows an example of a gigapixel image, from which we see that manually annotated ROIs (with cancer metastases), not usually available for model training, constitute a tiny proportion of ...

show abstract

Rethinking Efficacy of Softmax for Lightweight Non-local Neural Networks

Cho

Kim

Cho

et al. 2022

2022 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

Non-local (NL) block is a popular module that demonstrates the capability to model global contexts. However, NL block generally has heavy computation and memory costs, so it is impractical to apply the block to high-resolution feature maps. In this paper, to investigate the efficacy of NL block, we empirically analyze if the magnitude and direction of input feature vectors properly affect the attention between vectors. The results show the inefficacy of softmax operation that is generally used to normalize the attention map of the NL block. Attention maps normalized with softmax operation highly rely upon magnitude of key vectors, and performance is degenerated if the magnitude information is removed. By replacing softmax operation with the scaling factor, we demonstrate improved performance on CIFAR-10, CIFAR-100, and Tiny-ImageNet. In Addition, our method shows robustness to embedding channel reduction and embedding weight initialization. Notably, our method makes multi-head attention employable without additional computational cost.

show abstract

Efficient Coarse-to-Fine Non-Local Module for the Detection of Small Objects

Cited by 3 publications

References 33 publications

Neural Architecture Search for Lightweight Non-Local Networks

Neural Architecture Search for Lightweight Non-Local Networks

Efficient Classification of Very Large Images with Tiny Objects

Rethinking Efficacy of Softmax for Lightweight Non-local Neural Networks

Contact Info

Product

Resources

About