S. H. Mozafari scite author profile

Redundancy is now routinely allocated in circuits, microarchitectural structures, or at the system level, to mitigate mounting manufacturing yield losses. In this paper, we propose spare lane sharing, which reduces the cost of multi-core SIMT systems by allowing one of two neighboring cores to make use of a redundant lane if necessary. We have evaluated the performance-cost trade-offs of core-, lane-, and shared-lane-sparing under a variety of benchmarks, and found that for nearly all applications shared-lane-sparing outperforms lane-sparing, reducing cost by up to 20%.

show abstract

Work-in-Progress: Utilizing latency and accuracy predictors for efficient hardware-aware NAS

Firouzian

Mozafari

Clark

et al. 2022

View full text Add to dashboard Cite

Hot spare components for performance-cost improvement in multi-core SIMT

Mozafari

Meyer

2015

View full text Add to dashboard Cite

Implementing Convolutional Neural Networks Using Hartley Stochastic Computing With Adaptive Rate Feature Map Compression

Mozafari

Clark

Gross

et al. 2021

IEEE Open J. Circuits Syst.

View full text Add to dashboard Cite

Energy consumption and the latency of convolutional neural networks (CNNs) are two important factors that limit their applications specifically for embedded devices. Fourier-based frequency domain (FD) convolution is a promising low-cost alternative to conventional implementations in the spatial domain (SD) for CNNs. FD convolution performs its operation with point-wise multiplications. However, in CNNs, the overhead for the Fourier-based FD-convolution surpasses its computational saving for small filter sizes. In this work, we propose to implement convolutional layers in the FD using the Hartley transform (HT) instead of the Fourier transformation. We show that the HT can reduce the convolution delay and energy consumption even for small filters. With the HT of parameters, we replace convolution with point-wise multiplications. HT lets us compress input feature maps, in convolutional layers, before convolving them with filters. In this regard, we introduce two compression techniques: fixed-rate and adaptive-rate. In the fixed-rate compression, we select frequency domain input feature map (IFMap) coefficients with a constant pattern over all convolutional layers. However, for the adaptive-rate IFMap compression, the network, itself, learns to keep or discard coefficients, during training. Also, to optimize the hardware implementation of our methods (fixed-and adaptive-rate compressions), we utilize stochastic computing (SC) to perform the point-wise multiplications in the FD. In this regard, we re-formalize the HT to better match with SC. We show that, compared to conventional Fourier-based convolution, Hartley SC-based convolution can achieve 1.33x speedup, and energy is reduced by 23% on a Virtex 7 FPGA when we implement AlexNet over CIFAR-10 based on the fixed-rate compression. Also, we show that if we utilize the adaptive-rate compression, we receive 16% and 15% latency improvement and energy consumption reduction, respectively, compared to the fixed-rate method.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

S. H. Mozafari

Efficient Performance Evaluation of Multi-Core SIMT Processors with Hot Redundancy

Yield-aware Performance-Cost Characterization for Multi-Core SIMT

Work-in-Progress: Utilizing latency and accuracy predictors for efficient hardware-aware NAS

Hot spare components for performance-cost improvement in multi-core SIMT

Implementing Convolutional Neural Networks Using Hartley Stochastic Computing With Adaptive Rate Feature Map Compression

Contact Info

Product

Resources

About