Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient.In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00× speedup, 1.22× power efficiency and 1.22× area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02× speedup and 12.19× energy efficiency compared with ParaPIM on networks with 80% average sparsity.
Accelerating the inference of Convolution Neural Networks (CNNs) on edge devices is essential due to the small memory size and poor computation capability of these devices. Network quantization methods such as XNOR-Net, Bi-Real-Net, and XNOR-Net++ reduce the memory usage of CNNs by binarizing the CNNs. They also simplify the multiplication operations to bit-wise operations and obtain good speedup on edge devices. However, there are hidden redundancies in the computation pipeline of these methods, constraining the speedup of those binarized CNNs.In this paper, we propose XOR-Net as an optimized computation pipeline for binary networks both without and with scaling factors. As XNOR is realized by two instructions XOR and NOT on CPU/GPU platforms, XOR-Net avoids NOT operations by using XOR instead of XNOR, thus reduces bit-wise operations in both aforementioned kinds of binary convolution layers. For the binary convolution with scaling factors, our XOR-Net further rearranges the computation sequence of calculating and multiplying the scaling factors to reduce fullprecision operations. Theoretical analysis shows that XOR-Net reduces one-third of the bit-wise operations compared with traditional binary convolution, and up to 40% of the fullprecision operations compared with XNOR-Net. Experimental results show that our XOR-Net binary convolution without scaling factors achieves up to 135× speedup and consumes no more than 0.8% energy compared with parallel full-precision convolution. For the binary convolution with scaling factors, XOR-Net is up to 17% faster and 19% more energy-efficient than XNOR-Net.
A new trend tends to deploy deep learning algorithms to edge environments to mitigate privacy and latency issues from cloud computing. Diverse edge deep learning accelerators are devised to speed up the inference of deep learning algorithms on edge devices. Various edge deep learning accelerators feature different characteristics in terms of power and performance, which make it a very challenging task to efficiently and uniformly compare different accelerators. In this paper, we introduce EDLAB, an end-to-end benchmark, to evaluate the overall performance of edge deep learning accelerators. EDLAB consists of state-of-the-art deep learning models, a unified workload preprocessing and deployment framework, as well as a collection of comprehensive metrics.In addition, we propose parameterized models to model the hardware performance bound so that EDLAB can identify the hardware potentials and the hardware utilization of different deep learning applications. Finally, we employ EDLAB to benchmark three edge deep learning accelerators and analyze the benchmarking results. From the analysis we obtain some insightful observations that can guide the design of efficient deep learning applications.
Adder Neural Network (AdderNet) is a new type of Convolutional Neural Networks (CNNs) that replaces the computationalintensive multiplications in convolution layers with lightweight additions and subtractions. As a result, AdderNet preserves high accuracy with adder convolution kernels and achieves high speed and power efficiency. In-Memory Computing (IMC) is known as the next-generation artificial-intelligence computing paradigm that has been widely adopted for accelerating binary and ternary CNNs. As AdderNet has much higher accuracy than binary and ternary CNNs, accelerating AdderNet using IMC can obtain both performance and accuracy benefits. However, existing IMC devices have no dedicated subtraction function, and adding subtraction logic may bring larger area, higher power, and degraded addition performance.In this paper, we propose iMAD as an in-memory accelerator for AdderNet with efficient addition and subtraction operations. First, we propose an efficient in-memory subtraction operator at the circuit level and co-optimize the addition performance to reduce the latency and power. Second, we propose an accelerator architecture for AdderNet with high parallelism based on the optimized operators. Third, we propose an IMC-friendly computation pipeline for AdderNet convolution at the algorithm level to further boost the performance. Evaluation results show that our accelerator iMAD achieves 3.25× speedup and 3.55× energy efficiency compared with a state-of-the-art in-memory accelerator. CCS Concepts• Computing methodologies → Neural networks; • Hardware → Emerging architectures; Spintronics and magnetic technologies.
Ternary Neural Networks (TNNs) and mixed-precision Ternary Binary Networks (TBNs) have demonstrated higher accuracy compared to Binary Neural Networks (BNNs) while providing fast, low-power and memory-efficient inference. Related works have improved the accuracy of TNNs and TBNs, but overlooked their optimizations on CPU and GPU platforms. First, there is no unified encoding for the binary and ternary values in TNNs and TBNs. Second, existing works store the 2-bit quantized data sequentially in 32/64-bit integers, resulting in bit-extraction overhead. Last, adopting standard 2-bit multiplications for ternary values leads to a complex computation pipeline, and efficient mixed-precision multiplication between ternary and binary values is unavailable. In this paper, we propose TAB as a unified and optimized inference method for ternary, binary and mixed-precision neural networks. TAB includes unified value representation, efficient data storage scheme, and novel bitwise dot product pipelines on CPU/GPU platforms. We adopt signed integers for consistent value representation across binary and ternary values. We introduce a bitwidth-last data format that stores the first and second bits of the ternary values separately to remove the bit extraction overhead. We design the ternary and binary bitwise dot product pipelines based on Gated-XOR using up to 40% fewer operations than State-Of-The-Art (SOTA) methods. Theoretical speedup analysis shows that our proposed TAB-TNN is 2.3 × fast as the SOTA ternary method RTN, 9.8 × fast as 8-bit integer quantization (INT8), and 39.4 × fast as 32-bit full-precision convolution (FP32). Experiment results on CPU and GPU platforms show that our TAB-TNN has achieved up to 34.6 × speedup and 16 × storage size reduction compared with FP32 layers. TBN, Binary-activation Ternary-weight Network (BTN) and BNN in TAB are up to 40.7 ×, 56.2 × and 72.2 × fast as FP32. TAB-TNN is up to 70.1% faster and 12.8% more power-efficient than RTN on Darknet-19 while keeping the same accuracy. TAB is open source as a PyTorch Extension 1 for easy integration with existing CNN models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.