Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs

Colangelo, Philip; Nasiri, Nasibeh; Nurvitadhi, Eriko; Mishra, Asit K.; Margala, Martin; Nealis, Kevin

doi:10.1145/3174243.3174999

Cited by 21 publications

(20 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, the energy-delay-product of the posit EMAC with es = 0, on average, is 3× and 1.4× less than the energy-delay-product of the posit EMAC with es = 2 and es = 1, respectively. On the other hand, Fixed Float Posit the average performance of DNN inference with es = 1 for the posit EMAC among the five datasets and [5,7] bit-precision is 2% and 4% percent better than with es = 2 and es = 0, respectively. Thus, Deep Positron equipped with the posit (es = 1) EMAC has a better trade-off between energy-delay-product and accuracy for [5,7] bits.…”

Section: Exploiting the Posit Es Parametermentioning

confidence: 92%

“…On the other hand, Fixed Float Posit the average performance of DNN inference with es = 1 for the posit EMAC among the five datasets and [5,7] bit-precision is 2% and 4% percent better than with es = 2 and es = 0, respectively. Thus, Deep Positron equipped with the posit (es = 1) EMAC has a better trade-off between energy-delay-product and accuracy for [5,7] bits. For 8-bit, the results suggest that es = 1 is a better fit for energy-efficient applications and es = 2 for accuracy-dependent applications.…”

Section: Exploiting the Posit Es Parametermentioning

confidence: 92%

See 1 more Smart Citation

Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks

Carmichael

Langroudi

Khazanov

et al. 2019

Proceedings of the Conference for Next Generation Arithmetic 2019

View full text Add to dashboard Cite

Deep neural networks (DNNs) have been demonstrated as effective prognostic models across various domains, e.g. natural language processing, computer vision, and genomics. However, modern-day DNNs demand high compute and memory storage for executing any reasonably complex task. To optimize the inference time and alleviate the power consumption of these networks, DNN accelerators with low-precision representations of data and DNN parameters are being actively studied. An interesting research question is in how low-precision networks can be ported to edge-devices with similar performance as high-precision networks. In this work, we employ the fixed-point, floating point, and posit numerical formats at ≤8-bit precision within a DNN accelerator, Deep Positron, with exact multiply-and-accumulate (EMAC) units for inference. A unified analysis quantifies the trade-offs between overall network efficiency and performance across five classification tasks.Our results indicate that posits are a natural fit for DNN inference, outperforming at ≤8-bit precision, and can be realized with competitive resource requirements relative to those of floating point.

show abstract

Section: Exploiting the Posit Es Parametermentioning

confidence: 92%

Section: Exploiting the Posit Es Parametermentioning

confidence: 92%

Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks

Carmichael

Langroudi

Khazanov

et al. 2019

Proceedings of the Conference for Next Generation Arithmetic 2019

View full text Add to dashboard Cite

show abstract

“…The training of such models requires high computational power, however, once trained the network can work in realtime with even 15fps rate of processing power. The work in literature [61][62][63] mentions details about the training and computation power.…”

Section: Algorithmsmentioning

confidence: 99%

A review of the technological developments for interlocking at level crossing

Fayyaz

Alexoulis-Chrysovergis

Southgate

et al. 2020

Proceedings of the Institution of Mechanical Engineers, Part F:

View full text Add to dashboard Cite

A Level Crossing remains as one of the highest risk assets within the railway system often depending on the unpredictable behaviour of road and footpath users. For this purpose, interlocking through automated safety systems remains a key area for investigation. Within Europe, 2015–2016, 469 accidents at crossings were recorded of which 288 lead to fatalities and 264 lead to injuries. The European Union’s Agency for Railways has reported that Level Crossing fatalities account for just under 28% of all railway fatalities. This paper identifies suitable obstacle detection technologies and their associated algorithms that can be used to support risk reduction and management of Level Crossings. Furthermore, assessment and decision methods are presented to support their application. Finally state of the art and synergistic opportunity of which a combination of obstacle detection sensors with intelligent decisions layers such as Deep Learning are discussed which can provide robust interlocking decisions for rail applications. The sensor fusion of video camera and RADAR is a promising solution for Level Crossings. By applying additional sensing techniques such as RADAR imaging, further capabilities are added to the system, which can lead to a more robust approach.

show abstract

“…At present, there are two solutions to accelerate the use of CNNs. One is to reduce the computational complexity of the neural network; many such methods have been proposed that maintain the accuracy, including quantification, tailoring, sparsity and fast convolution [7][8][9][10][11][12][13]. The other solution is to use a high-performance, low-power hardware accelerator.…”

Section: Introductionmentioning

confidence: 99%

Convolution Accelerator Designs Using Fast Algorithms

Zhao

Wang

2019

Algorithms

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) have achieved great success in image processing. However, the heavy computational burden it imposes makes it difficult for use in embedded applications that have limited power consumption and performance. Although there are many fast convolution algorithms that can reduce the computational complexity, they increase the difficulty of practical implementation. To overcome these difficulties, this paper proposes several convolution accelerator designs using fast algorithms. The designs are based on the field programmable gate array (FPGA) and display a better balance between the digital signal processor (DSP) and the logic resource, while also requiring lower power consumption. The implementation results show that the power consumption of the accelerator design based on the Strassen–Winograd algorithm is 21.3% less than that of conventional accelerators.

show abstract

Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs

Cited by 21 publications

References 17 publications

Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks

Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks

A review of the technological developments for interlocking at level crossing

Convolution Accelerator Designs Using Fast Algorithms

Contact Info

Product

Resources

About