Recursive Binary Neural Network Training Model for Efficient Usage of On-Chip Memory

Guan, Tianchan; Liu, Peiye; Zeng, Xiaoyang; Kim, Martha; Seok, Mingoo

doi:10.1109/tcsi.2019.2895216

Cited by 8 publications

(5 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We evaluate the performance of EILE by training a fully-connected network with 2 hidden layers (network size: 784-512-256-10, total 1 MB of parameters) on the full MNIST [13] handwritten digit dataset. Activations are quantized to fixed-point Q (8,8) format while weights and gradients are quantized to Q (2,14) format for batch size of 1, where Q(m, n) denotes the quantization using m bits for the integer part and n bits for the fraction.…”

Section: Resultsmentioning

confidence: 99%

“…Other implementations that support on-chip training on the edge include an FPGA design [7] that uses a special memory management unit to alleviate the impact of irregular memory accesses but the performance during BP is still below that of the FP. A recent study [8] exploits a recursive algorithm for training binary neural networks; however, the processing element (PE) utilization efficiency of FP and BP phases were not reported. Other implementations [9], [10], [11] use custom PE architectures to support on-chip training of different DNN architectures, but their performance either decreases during BP [9] or decreases with smaller batch sizes [10], [11].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

EILE: Efficient Incremental Learning on the Edge

Chen

Gao

Delbrück

et al. 2021

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)

View full text Add to dashboard Cite

This paper proposes a fully-connected network training architecture called EILE targeting incremental learning on edge. By using a novel reconfigurable processing element (PE) architecture, EILE avoids explicit transposition of weight matrices required for backpropagation to preserve the same efficient memory access pattern for both the forward (FP) and backward propagation (BP) phases. Experimental results on a Zynq XC7Z100 FPGA with 64 PEs show that EILE achieves 19.2 GOp/s peak throughput and maintains nearly 100 % PE utilization efficiency for both FP and BP with batch sizes from 1 to 32. EILE's small on-chip memory footprint and scalability to match any available off-chip memory bandwidth makes it an attractive ASIC architecture for energy-constrained training.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

EILE: Efficient Incremental Learning on the Edge

Chen

Gao

Delbrück

et al. 2021

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)

View full text Add to dashboard Cite

show abstract

“…FINN [38] proposes a framework for binary neural networks inference on FPGA. ReBNN [10] focuses on reducing memory usage when training BNN. WRPN [30] synthesizes an ASIC for multiple precisions including binary.…”

Section: Related Workmentioning

confidence: 99%

Binary Neural Networks as a general-propose compute paradigm for on-device computer vision

Nie¹,

Xiao²,

Zhu³

et al. 2022

Preprint

View full text Add to dashboard Cite

For binary neural networks (BNNs) to become the mainstream on-device computer vision algorithm, they must achieve a superior speed-vs-accuracy tradeoff than 8-bit quantization and establish a similar degree of general applicability in vision tasks. To this end, we propose a BNN framework comprising 1) a minimalistic inference scheme for hardware-friendliness, 2) an over-parameterized training scheme for high accuracy, and 3) a simple procedure to adapt to different vision tasks. The resultant framework overtakes 8-bit quantization in the speed-vs-accuracy tradeoff for classification, detection, segmentation, superresolution and matching: our BNNs not only retain the accuracy levels of their 8-bit baselines but also showcase 1.3-2.4× faster FPS on mobile CPUs. Similar conclusions can be drawn for prototypical systolic-array-based AI accelerators, where our BNNs promise 2.8-7× fewer execution cycles than 8-bit and 2.1-2.7× fewer cycles than alternative BNN designs. These results suggest that the time for largescale BNN adoption could be upon us.

show abstract

“…Figure. 3 shows the block diagram of the proposed VTC circuit design. It consists of a sampling circuit, an inverter, and a current source.…”

Section: B Proposed Voltage-to-time Converter (Vtc)mentioning

confidence: 99%

“…Recently, research has been focused on AI applications to address complex machine learning problems such as image/speech recognition and language translation [2]. Deep neural networks (DNNs) are widely utilized in such applications since it can achieve high accuracy [3]. However, DNN algorithms are computationally intensive, with large data sets that require high memory bandwidth.…”

Section: Introductionmentioning

confidence: 99%

C3PU: Cross-Coupling Capacitor Processing Unit Using Analog-Mixed Signal for AI Inference

Kilani¹,

Mohammad

Halawani

et al. 2021

IEEE Access

View full text Add to dashboard Cite

This paper presents a novel cross-coupling capacitor processing unit (C3PU) that supports analog-mixed signal in-memory computing to perform multiply-and-accumulate (MAC) operations. The C3PU consists of a capacitive unit, a CMOS transistor, and a voltage-to-time converter (VTC). The capacitive unit serves as a computational element that holds the multiplier operand and performs multiplication once the multiplicand is applied at the terminal. The multiplicand is the input voltage that is converted to a pulse width signal using a low power VTC. The transistor transfers this multiplication where a voltage level is generated. A demonstrator of 5×4 C3PU array that is capable of implementing 4 MAC units is presented. The design has been verified using Monte Carlo simulation in 65 nm technology. The 5×4 C3PU consumed energy of 66.4 fJ/MAC at 0.3 V voltage supply with an error of 5.7%. The proposed unit achieves lower energy and occupies a smaller area by 3.4× and 3.6×, respectively, with similar error value when compared to a digital-based 8×4-bit fixed point MAC unit. The C3PU has been utilized through an iris flower classification utilizing an artificial neural network which achieved a 90% classification accuracy compared to ideal accuracy of 96.67% using MATLAB.

show abstract

Recursive Binary Neural Network Training Model for Efficient Usage of On-Chip Memory

Cited by 8 publications

References 43 publications

EILE: Efficient Incremental Learning on the Edge

EILE: Efficient Incremental Learning on the Edge

Binary Neural Networks as a general-propose compute paradigm for on-device computer vision

C3PU: Cross-Coupling Capacitor Processing Unit Using Analog-Mixed Signal for AI Inference

Contact Info

Product

Resources

About