Hardware-Efficient On-line Learning through Pipelined Truncated-Error Backpropagation in Binary-State Networks

Mostafa, Hesham; Pedroni, Bruno U.; Sheik, Sadique; Cauwenberghs, Gert

doi:10.3389/fnins.2017.00496

Cited by 9 publications

(10 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Uniformly distributed pseudo random numbers can be generated cheaply using a linear feedback shift register (LFSR) Klein ( 2013 ) as implemented in ref. Mostafa et al ( 2017 ) on FPGAs. Generating new random numbers from LFSRs is very computationally cheap as it involves only few bit-wise XOR operations and no MAC operations.…”

Section: Methodsmentioning

confidence: 99%

“…For example, in the scheme used in Cauwenberghs ( 1996 ), the number of flip flops needed to produce N b random bits per cycle grows with

. This scheme was adopted in Mostafa et al ( 2017 ) that used 60 flip flops and 650 XOR gates to generate 320 random bits every clock cycle. In the training experiments reported in this paper, we did not use an explicit LFSR, but instead used a standard software-based random number generator.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Deep Supervised Learning Using Local Errors

2018

Self Cite

View full text Add to dashboard Cite

Error backpropagation is a highly effective mechanism for learning high-quality hierarchical features in deep networks. Updating the features or weights in one layer, however, requires waiting for the propagation of error signals from higher layers. Learning using delayed and non-local errors makes it hard to reconcile backpropagation with the learning mechanisms observed in biological neural networks as it requires the neurons to maintain a memory of the input long enough until the higher-layer errors arrive. In this paper, we propose an alternative learning mechanism where errors are generated locally in each layer using fixed, random auxiliary classifiers. Lower layers could thus be trained independently of higher layers and training could either proceed layer by layer, or simultaneously in all layers using local error information. We address biological plausibility concerns such as weight symmetry requirements and show that the proposed learning mechanism based on fixed, broad, and random tuning of each neuron to the classification categories outperforms the biologically-motivated feedback alignment learning technique on the CIFAR10 dataset, approaching the performance of standard backpropagation. Our approach highlights a potential biological mechanism for the supervised, or task-dependent, learning of feature hierarchies. In addition, we show that it is well suited for learning deep networks in custom hardware where it can drastically reduce memory traffic and data communication overheads. Code used to run all learning experiments is available under https://gitlab.com/hesham-mostafa/learning-using-local-erros.git.

show abstract

Section: Methodsmentioning

confidence: 99%

“…For example, in the scheme used in Cauwenberghs ( 1996 ), the number of flip flops needed to produce N b random bits per cycle grows with

Section: Methodsmentioning

confidence: 99%

Deep Supervised Learning Using Local Errors

2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…A proposed solution to reduce the computational complexity and optimize memory resources is the use of pipelined backpropagation [31] and binary state network [32]. In binary state network, the output of a neuron can be unipolar (0/1) or bipolar (−1/1) binary.…”

Section: B Backpropagation and Variantsmentioning

confidence: 99%

Low-Power, Adaptive Neuromorphic Systems: Recent Progress and Future Directions

Basu

Acharya

Karnik

et al. 2018

IEEE J. Emerg. Sel. Topics Circuits Syst.

106

View full text Add to dashboard Cite

In this paper, we present a survey of recent works in developing neuromorphic or neuro-inspired hardware systems.In particular, we focus on those systems which can either learn from data in an unsupervised or online supervised manner. We present algorithms and architectures developed specially to support on-chip learning. Emphasis is placed on hardware friendly modifications of standard algorithms, such as backpropagation, as well as novel algorithms, such as structural plasticity, developed specially for low-resolution synapses. We cover works related to both spike-based and more traditional non-spike-based algorithms. This is followed by developments in novel devices, such as floating-gate MOS, memristors, and spintronic devices. CMOS circuit innovations for on-chip learning and CMOS interface circuits for post-CMOS devices, such as memristors, are presented. Common architectures, such as crossbar or island style arrays, are discussed, along with their relative merits and demerits. Finally, we present some possible applications of neuromorphic hardware, such as brain-machine interfaces, robotics, etc., and identify future research trends in the field.

show abstract

“…However, they realized the idea for only a 3-layer perceptron on a torus of 16 processors. Mostafa et al [13] implemented a proof-ofconcept validation of pipelined backpropagation training for a 3-layer fully connected binary-state neural network with truncated-error FPGA. However, the implementation does not have the coarse-grained layer-wise pipelined parallelization.…”

Section: Literature Reviewmentioning

confidence: 99%

“…Existing pipelined training approaches either avoid the use of stale weights (e.g., with the use of microbatches [8]), constrain the training to ensure the consistency of the weights within an accelerator (e.g., using weight stashing [9]), utilize weight adjustments (e.g., weight prediction [11]), or limit the use of pipelining to very small networks (e.g., [13]). However, these approaches underutilize accelerators [8], inflate memory usage to stash multiple copies of weights [9], or are unable to handle large networks [13].…”

Section: Introductionmentioning

confidence: 99%

Pipelined Training with Stale Weights in Deep Convolutional Neural Networks

Zhang

Abdelrahman

2021

Applied Computational Intelligence and Soft Computing

View full text Add to dashboard Cite

The growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches to pipelined training avoid or limit the use of stale weights with techniques that either underutilize accelerators or increase training memory footprint. This paper contributes a pipelined backpropagation scheme that uses stale weights to maximize accelerator utilization and keep memory overhead modest. It explores the impact of stale weights on the statistical efficiency and performance using 4 CNNs (LeNet-5, AlexNet, VGG, and ResNet) and shows that when pipelining is introduced in early layers, training with stale weights converges and results in models with comparable inference accuracies to those resulting from nonpipelined training (a drop in accuracy of 0.4%, 4%, 0.83%, and 1.45% for the 4 networks, respectively). However, when pipelining is deeper in the network, inference accuracies drop significantly (up to 12% for VGG and 8.5% for ResNet-20). The paper also contributes a hybrid training scheme that combines pipelined with nonpipelined training to address this drop. The potential for performance improvement of the proposed scheme is demonstrated with a proof-of-concept pipelined backpropagation implementation in PyTorch on 2 GPUs using ResNet-56/110/224/362, achieving speedups of up to 1.8X over a 1-GPU baseline.

show abstract

Hardware-Efficient On-line Learning through Pipelined Truncated-Error Backpropagation in Binary-State Networks

Cited by 9 publications

References 47 publications

Deep Supervised Learning Using Local Errors

Deep Supervised Learning Using Local Errors

Low-Power, Adaptive Neuromorphic Systems: Recent Progress and Future Directions

Pipelined Training with Stale Weights in Deep Convolutional Neural Networks

Contact Info

Product

Resources

About