Sparse, Quantized, Full Frame CNN for Low Power Embedded Devices

Mathew, Manu; Desappan, Kumar; Swami, Pramod; Nagori, Soyeb

doi:10.1109/cvprw.2017.46

Cited by 28 publications

(21 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The function of the backbone encoder is to process an input image and extract rich abstract features from the input image that represent the crucial information in the image. Instead of adopting very deep and wide architectures such as AlexNet [19], GoogleNet [20], DenseNet [21], ResNet101 [22] and VGG16 [23] that comprise numerous parameters and incurs higher computation cost, an open source lightweight architecture named JacintoNet [24] which is designed for embedded devices is adopted in the proposed method. The JacintoNet is a modified ResNet-10 by removing the shortcut connection.…”

Section: A Backbone Encodermentioning

confidence: 99%

MTSAN: Multi-Task Semantic Attention Network for ADAS Applications

Lai

Shivanna

et al. 2021

IEEE Access

View full text Add to dashboard Cite

This paper presents a lightweight Multi-task Semantic Attention Network (MTSAN) to collectively deal with object detection as well as semantic segmentation aiding real-time applications of the Advanced Driver Assistance Systems (ADAS). This paper proposes a Semantic Attention Module (SAM) that introduces the semantic contextual clues from a segmentation subnet to guide a detection subnet. The SAM significantly boosts up the detection performance and computational cost by considerably decreasing the false alarm rate and it is completely independent of any other parameters. The experimental results show the effectiveness of each component of the network and demonstrate that the proposed MTSAN yields a better balance between accuracy and speed. Following the post-processing methods, the proposed module is tested and proved for its accuracy in the Lane Departure Warning System (LDWS) and Forward Collision Warning System (FCWS). In addition, the proposed lightweight network is deployable on low-power embedded devices to meet the requirements of the real-time applications yielding 10FPS @ 512 X 256 on NVIDIA Jetson Xavier and 15FPS @ 512 X 256 on Texas Instrument's TDA2x.

show abstract

Section: A Backbone Encodermentioning

confidence: 99%

MTSAN: Multi-Task Semantic Attention Network for ADAS Applications

Lai

Shivanna

et al. 2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Sparse representations get more efficient when supported by proper hardware. Examples for integrated accelerators are in the Texas Instruments TDAx processor [18] family or in the custom ASIC described in [19]. Unfortunately, low power budgets prevent the use of such components on MCUs.…”

Section: A Pruningmentioning

confidence: 99%

Optimality Assessment of Memory-Bounded ConvNets Deployed on Resource-Constrained RISC Cores

2019

View full text Add to dashboard Cite

A cost-effective implementation of Convolutional Neural Nets on the mobile edge of the Internet-of-Things (IoT) requires smart optimizations to fit large models into memory-constrained cores. Reduction methods that use a joint combination of filter pruning and weight quantization have proven efficient in searching the compression that ensures minimum model size without accuracy loss. However, there exist other optimal configurations that stem from the memory constraint. The objective of this work is to make an assessment of such memory-bounded implementations and to show that most of them are centred on specific parameter settings that are found difficult to be implemented on a low-power RISC. Hence, the focus is on quantifying the distance to optimality of the closest implementations that instead can be actually deployed on hardware. The analysis is powered by a two-stage framework that efficiently explores the memory-accuracy space using a lightweight, hardware-conscious heuristic optimization. Results are collected from three realistic IoT tasks (Image Classification on CIFAR-10, Keyword Spotting on the Speech Commands Dataset, Facial Expression Recognition on Fer2013) run on RISC cores (Cortex-M by ARM) with few hundreds KB of on-chip RAM. INDEX TERMS Neural networks, Internet of Things, optimization methods, low power electronics.

show abstract

“…There have been works which apply these techniques for semantic segmentation [14]. However in [14], the focus was on a coarse segmentation on only a few classes, while we are attempting to build efficient models for the Cityscapes benchmark [3] with all the classes. Newer approaches to model compression, involves de-signing efficient CNN Modules.…”

Section: Model Compressionmentioning

confidence: 99%

“…Grouped convolution is another way of building structured sparse convolutions. Such a convolution with groups parameter g, decreases the parameter and FLOPs of the layer by a factor g. Grouped convolutions also help in data bandwidth reduction [14], was first implemented by AlexNet [11]. It has been used for efficient layer design in ResNext [21].…”

Section: Grouped Convolutionsmentioning

confidence: 99%

Efficient Semantic Segmentation Using Gradual Grouping

Vallurupalli

Annamaneni

Varma

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Self Cite

View full text Add to dashboard Cite

Deep CNNs for semantic segmentation have high memory and run time requirements. Various approaches have been proposed to make CNNs efficient like grouped, shuffled, depth-wise separable convolutions. We study the effectiveness of these techniques on a real-time semantic segmentation architecture like ERFNet for improving runtime by over 5X. We apply these techniques to CNN layers partially or fully and evaluate the testing accuracies on Cityscapes dataset. We obtain accuracy vs parameters/FLOPs trade offs, giving accuracy scores for models that can run under specified runtime budgets.We further propose a novel training procedure which starts out with a dense convolution but gradually evolves towards a grouped convolution. We show that our proposed training method and efficient architecture design can improve accuracies by over 8% with depthwise separable convolutions applied on the encoder of ERFNet and attaching a light weight decoder. This results in a model which has a 5X improvement in FLOPs while only suffering a 4% degradation in accuracy with respect to ERFNet.

show abstract

Sparse, Quantized, Full Frame CNN for Low Power Embedded Devices

Cited by 28 publications

References 5 publications

MTSAN: Multi-Task Semantic Attention Network for ADAS Applications

MTSAN: Multi-Task Semantic Attention Network for ADAS Applications

Optimality Assessment of Memory-Bounded ConvNets Deployed on Resource-Constrained RISC Cores

Efficient Semantic Segmentation Using Gradual Grouping

Contact Info

Product

Resources

About