Neuron-Level Fuzzy Memoization in RNNs

Silfa, Franyell; Dot, Gem; Arnau, José-María; González, Antonio

doi:10.1145/3352460.3358309

Cited by 17 publications

(18 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even though most neurons exhibit a high correlation, a significant number of neurons have moderate or even low correlations. This observation is consistent with the observations made by Anderson et al [18] and more recently, Silfa et al [9]. A predictor based on 1-bit weights for a neuron with a low self-correlation coefficient is expected to make frequent mistakes, and consequently, reduce the overall accuracy of the DNN.…”

Section: Exploiting Self-correlationsupporting

confidence: 92%

“…First, the dot-product between 1-bit valued vectors does not require multipliers, simplifying the hardware by a large extent. Second, since the 1-bit weights are obtained from the sign bits of the full precision weights [9], they do not incur in any memory footprint overhead since they do not have to be stored separately, but can be obtained directly from the full precision weights.…”

Section: Exploiting Self-correlationmentioning

confidence: 99%

“…The first component is based on the observation that the output of the binarized dot product, i.e. after converting inputs and weights to 1-bit, shows high linear correlation with the original 8-bit dot product [9]. For this purpose, we use a subset of the training set to perform a linear regression between binarized and 8-bit outputs, obtaining a fitted line for each neuron/filter.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Mixture-of-Rookies: Saving DNN Computations by Predicting ReLU Outputs

Pinto¹,

Arnau²,

González³

2022

Preprint

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) are widely used in many applications domains. However, they require a vast amount of computations and memory accesses to deliver outstanding accuracy. In this paper, we propose a scheme to predict whether the output of each ReLu activated neuron will be a zero or a positive number in order to skip the computation of those neurons that will likely output a zero. Our predictor, named Mixture-of-Rookies, combines two inexpensive components. The first one exploits the high linear correlation between binarized (1-bit) and full-precision (8-bit) dot products, whereas the second component clusters together neurons that tend to output zero at the same time. We propose a novel clustering scheme based on analysis of angles, as the sign of the dot product of two vectors depends on the cosine of the angle between them. We implement our hybrid zero output predictor on top of a state-of-the-art DNN accelerator. Experimental results show that our scheme introduces a small area overhead of 5.3% while achieving a speedup of 1.2x and reducing energy consumption by 16.5% on average for a set of diverse DNNs.

show abstract

Section: Exploiting Self-correlationsupporting

confidence: 92%

Section: Exploiting Self-correlationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Mixture-of-Rookies: Saving DNN Computations by Predicting ReLU Outputs

Pinto¹,

Arnau²,

González³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The memoization technique has been studied in many fields. Franyell et al [71] propose a fuzzy memoization scheme that avoids more than 24.2% of computation for RNN training. Liu et al [44] replace a long sequence of instructions with a two-level lookup table.…”

Section: Related Workmentioning

confidence: 99%

Md-Hm

Xie

Dong

Liu

et al. 2021

Proceedings of the ACM International Conference on Supercomputing

View full text Add to dashboard Cite

Molecular dynamics (MD) simulation is a fundamental method for modeling ensembles of particles. In this paper, we introduce a new method to improve the performance of MD by leveraging the emerging TB-scale big memory system. In particular, we trade memory capacity for computation capability to improve MD performance by the lookup table-based memoization technique. The traditional memoization technique for the MD simulation uses relatively small DRAM, bases on a suboptimal data structure, and replaces pair-wise computation, which leads to limited performance benefit in the big memory system. We introduce MD-HM, a memoization-based MD simulation framework customized for the big memory system. MD-HM partitions the simulation field into subgrids, and replaces computation in each subgrid as a whole based on a lightweight patternmatch algorithm to recognize computation in the subgrid. MD-HM uses a new two-phase LSM-tree to optimize read/write performance. Evaluating with nine MD simulations, we show that MD-HM outperforms the state-of-the-art LAMMPS simulation framework with an average speedup of 7.6× based on the Intel Optane-based big memory system.

show abstract

“…DeltaRNN [20] and the work in [21] exploit temporal coherency of the LSTM data to reuse computations and avoid redundant memory accesses. Work in [22] improves RNN energy efficiency by skipping computations, since previous calculations are memoized and reused in future evaluations. We focus on changing the bit-width at runtime, and computations are never skipped.…”

Section: Related Workmentioning

confidence: 99%

Boosting LSTM Performance Through Dynamic Precision Selection

Silfa

Arnau

González

2020

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)

Self Cite

View full text Add to dashboard Cite

The use of low numerical precision is a fundamental optimization included in modern accelerators for Deep Neural Networks (DNNs). The number of bits of the numerical representation is set to the minimum precision that is able to retain accuracy based on an offline profiling, and it is kept constant for DNN inference.In this work, we explore the use of dynamic precision selection during DNN inference. We focus on Long Short Term Memory (LSTM) networks, which represent the state-of-theart networks for applications such as machine translation and speech recognition. Unlike conventional DNNs, LSTM networks remember information from previous evaluations by storing data in the LSTM cell state. Our key observation is that the cell state determines the amount of precision required: timesteps where the cell state changes significantly require higher precision, whereas time-steps where the cell state is stable can be computed with lower precision without any loss in accuracy.We propose a novel hardware scheme that tracks the evolution of the elements in the LSTM cell state and dynamically selects the appropriate precision on each time-step. For a set of popular LSTM networks, it chooses the lowest precision for 57% of the time, outperforming systems that fix the precision statically. We evaluate our proposal on top of a modern highly-optimized LSTM accelerator, and show that it provides 1.46x speedup and 19.2% energy savings on average without degrading the model accuracy. Our scheme has an overhead of less than 8%.

show abstract

Neuron-Level Fuzzy Memoization in RNNs

Cited by 17 publications

References 26 publications

Mixture-of-Rookies: Saving DNN Computations by Predicting ReLU Outputs

Mixture-of-Rookies: Saving DNN Computations by Predicting ReLU Outputs

Md-Hm

Boosting LSTM Performance Through Dynamic Precision Selection

Contact Info

Product

Resources

About