Redundancy is now routinely allocated in circuits, microarchitectural structures, or at the system level, to mitigate mounting manufacturing yield losses. In this paper, we propose spare lane sharing, which reduces the cost of multi-core SIMT systems by allowing one of two neighboring cores to make use of a redundant lane if necessary. We have evaluated the performance-cost trade-offs of core-, lane-, and shared-lane-sparing under a variety of benchmarks, and found that for nearly all applications shared-lane-sparing outperforms lane-sparing, reducing cost by up to 20%.
Energy consumption and the latency of convolutional neural networks (CNNs) are two important factors that limit their applications specifically for embedded devices. Fourier-based frequency domain (FD) convolution is a promising low-cost alternative to conventional implementations in the spatial domain (SD) for CNNs. FD convolution performs its operation with point-wise multiplications. However, in CNNs, the overhead for the Fourier-based FD-convolution surpasses its computational saving for small filter sizes. In this work, we propose to implement convolutional layers in the FD using the Hartley transform (HT) instead of the Fourier transformation. We show that the HT can reduce the convolution delay and energy consumption even for small filters. With the HT of parameters, we replace convolution with point-wise multiplications. HT lets us compress input feature maps, in convolutional layers, before convolving them with filters. In this regard, we introduce two compression techniques: fixed-rate and adaptive-rate. In the fixed-rate compression, we select frequency domain input feature map (IFMap) coefficients with a constant pattern over all convolutional layers. However, for the adaptive-rate IFMap compression, the network, itself, learns to keep or discard coefficients, during training. Also, to optimize the hardware implementation of our methods (fixed-and adaptive-rate compressions), we utilize stochastic computing (SC) to perform the point-wise multiplications in the FD. In this regard, we re-formalize the HT to better match with SC. We show that, compared to conventional Fourier-based convolution, Hartley SC-based convolution can achieve 1.33x speedup, and energy is reduced by 23% on a Virtex 7 FPGA when we implement AlexNet over CIFAR-10 based on the fixed-rate compression. Also, we show that if we utilize the adaptive-rate compression, we receive 16% and 15% latency improvement and energy consumption reduction, respectively, compared to the fixed-rate method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.