Computation Reuse in DNNs by Exploiting Input Similarity

Riera, Marc; Arnau, José-María; González, Antonio

doi:10.1109/isca.2018.00016

Cited by 95 publications

(49 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An analysis of the output results reveals that many neurons produce very similar outputs for consecutive elements in the input sequence. On average, the relative difference between the current and previous output of a neuron is smaller than 23% in our set of RNNs, whereas previous work in [28] has reported similar results. Since RNNs are inherently error tolerant [36], we propose to exploit the aforementioned property to save computations by using a neuron-level fuzzy memoization scheme.…”

Section: Introductionsupporting

confidence: 60%

“…Note that RNNs are used in sequence processing problems such as speech recognition or video processing, where RNN inputs in consecutive time steps tend to be extremely similar. Prior work in [28] reports high similarity across consecutive frames of audio or video. Not surprisingly, our own numbers for our set of RNNs also support this claim.…”

Section: Rnns Redundancymentioning

confidence: 99%

See 1 more Smart Citation

Neuron-Level Fuzzy Memoization in RNNs

Silfa

Dot

Arnau

et al. 2019

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

Self Cite

View full text Add to dashboard Cite

Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation. Unlike conventional feed-forward DNNs, RNNs remember past information to improve the accuracy of future predictions and, therefore, they are very effective for sequence processing problems. For each application run, each recurrent layer is executed many times for processing a potentially large sequence of inputs (words, images, audio frames, etc.). In this paper, we make the observation that the output of a neuron exhibits small changes in consecutive invocations. We exploit this property to build a neuron-level fuzzy memoization scheme, which dynamically caches the output of each neuron and reuses it whenever it is predicted that the current output will be similar to a previously computed result, avoiding in this way the output computations. The main challenge in this scheme is determining whether the new neuron's output for the current input in the sequence will be similar to a recently computed result. To this end, we extend the recurrent layer with a much simpler Bitwise Neural Network (BNN), and show that the BNN and RNN outputs are highly correlated: if two BNN outputs are very similar, the corresponding outputs in the original RNN layer are likely to exhibit negligible changes. The BNN provides a low-cost and effective mechanism for deciding when fuzzy memoization can be applied with a small impact on accuracy. We evaluate our memoization scheme on top of a state-of-the-art accelerator for RNNs, for a variety of different neural networks from multiple application domains. We show that our technique avoids more than 24.2% of computations, resulting in 18.5% energy savings and 1.35x speedup on average. CCS CONCEPTS • Computer systems organization → Neural Networks; • Computing Methodologies → Machine Learning;

show abstract

Section: Introductionsupporting

confidence: 60%

Section: Rnns Redundancymentioning

confidence: 99%

Neuron-Level Fuzzy Memoization in RNNs

Silfa

Dot

Arnau

et al. 2019

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

Self Cite

View full text Add to dashboard Cite

show abstract

“…Cross-frame optimization. It is a new research direction to reduce CNN computation by exploiting temporal redundancy [8,67] or input similarity [50] across video or audio frames. Moreover, this concept has also been applied to compensate the unreliability brought by pruning [60].…”

Section: Related Workmentioning

confidence: 99%

eCNN

Huang

Ding

Wang

et al. 2019

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) have recently demonstrated superior quality for computational imaging applications. Therefore, they have great potential to revolutionize the image pipelines on cameras and displays. However, it is difficult for conventional CNN accelerators to support ultra-high-resolution videos at the edge due to their considerable DRAM bandwidth and power consumption. Therefore, finding a further memory-and computation-efficient microarchitecture is crucial to speed up this coming revolution.In this paper, we approach this goal by considering the inference flow, network model, instruction set, and processor design jointly to optimize hardware performance and image quality. We apply a block-based inference flow which can eliminate all the DRAM bandwidth for feature maps and accordingly propose a hardwareoriented network model, ERNet, to optimize image quality based on hardware constraints. Then we devise a coarse-grained instruction set architecture, FBISA, to support power-hungry convolution by massive parallelism. Finally, we implement an embedded processor-eCNN-which accommodates to ERNet and FBISA with a flexible processing architecture. Layout results show that it can support high-quality ERNets for super-resolution and denoising at up to 4K Ultra-HD 30 fps while using only DDR-400 and consuming 6.94W on average. By comparison, the state-of-the-art Diffy uses dualchannel DDR3-2133 and consumes 54.3W to support lower-quality VDSR at Full HD 30 fps. Lastly, we will also present application examples of high-performance style transfer and object recognition to demonstrate the flexibility of eCNN.

show abstract

“…The work by Riera et al leverages the error tolerance of deep neural networks (DNN) on a hardware implementation of a reuse‐based DNN accelerator. By using the proposed accelerator, an improvement on energy utilization was observed.…”

Section: Related Workmentioning

confidence: 99%

Optimizing parameter sensitivity analysis of large‐scale microscopy image analysis workflows with multilevel computation reuse

Barreiros

Moreira

Kurç

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

Summary Parameter sensitivity analysis (SA) is an effective tool to gain knowledge about complex analysis applications and assess the variability in their analysis results. However, it is an expensive process as it requires the execution of the target application multiple times with a large number of different input parameter values. In this work, we propose optimizations to reduce the overall computation cost of SA in the context of analysis applications that segment high‐resolution slide tissue images, ie, images with resolutions of 100k × 100k pixels. Two cost‐cutting techniques are combined to efficiently execute SA: use of distributed hybrid systems for parallel execution and computation reuse at multiple levels of an analysis pipeline to reduce the amount of computation. These techniques were evaluated using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. Our parallel execution method attained an efficiency of over 90% on 256 nodes. The hybrid execution on the CPU and Intel Phi improved the performance by 2×. Multilevel computation reuse led to performance gains of over 2.9×.

show abstract

Computation Reuse in DNNs by Exploiting Input Similarity

Cited by 95 publications

References 28 publications

Neuron-Level Fuzzy Memoization in RNNs

Neuron-Level Fuzzy Memoization in RNNs

eCNN

Optimizing parameter sensitivity analysis of large‐scale microscopy image analysis workflows with multilevel computation reuse

Contact Info

Product

Resources

About