Butterfly Transform: An Efficient FFT Based Neural Architecture Design

Vahid, Keivan Alizadeh; Prabhu, Anish; Farhadi, Ali; Rastegari, Mohammad

doi:10.1109/cvpr42600.2020.01204

Cited by 25 publications

(23 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Residual connections have been proposed to connect the butterfly factors [Vahid et al, 2020]. We show that residual products of butterfly matrices have a first-order approximation as a sparse matrix with a fixed sparsity.…”

Section: Flat Butterfly Matricesmentioning

confidence: 98%

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Chen¹,

Dao²,

Liang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Overparameterized neural networks generalize well but are expensive to train. Ideally, one would like to reduce their computational cost while retaining their generalization benefits. Sparse model training is a simple and promising approach to achieve this, but there remain challenges as existing methods struggle with accuracy loss, slow training runtime, or difficulty in sparsifying all model components. The core problem is that searching for a sparsity mask over a discrete set of sparse matrices is difficult and expensive. To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices. As butterfly matrices are not hardware efficient, we propose simple variants of butterfly (block and flat) to take advantage of modern hardware. Our method (Pixelated Butterfly) uses a simple fixed sparsity pattern based on flat block butterfly and low-rank matrices to sparsify most network layers (e.g., attention, MLP). We empirically validate that Pixelated Butterfly is 3× faster than butterfly and speeds up training to achieve favorable accuracy-efficiency tradeoffs. On the ImageNet classification and WikiText-103 language modeling tasks, our sparse models train up to 2.5× faster than the dense MLP-Mixer, Vision Transformer, and GPT-2 medium with no drop in accuracy. * Equal contribution. Order determined by coin flip.

show abstract

Section: Flat Butterfly Matricesmentioning

confidence: 98%

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Chen¹,

Dao²,

Liang³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…General compressing methods for DCNNs include: 1) Quantization: Although quantization methods don't reduce the number of operations, they can reduce the DCNN model and the computation cost by altering floating-point to fixed-point operations that use simple circuitry in hardware. DoReFaNet [24] and QKeras [25] are frameworks that allow quantizing both weights and feature maps (fmaps) to any [3] (C) Butterfly Transform [17] (D) ShuffleNet [18] (E) SqueezeNet [19] (F) Low-Rank Expansion [20] (G) PermDNN [21] (H) CSC blocks (This work) Scheme-1 Scheme-2 Fig. 3.…”

Section: Related Workmentioning

confidence: 99%

“…Below each graph, the connectivity matrices from left to right, represent the topology of the graph from right to left. Compact models such as Low-Rank Expansion [20], MobileNet [3], ShuffleNet [18], SqueezNet [19], PermDNN [21], PermCNN [9], and Butterfly Transform [17], as illustrated in Fig. 3 introduce alternate pre-sparsified layers with a common intuition: in all of them, each model proposes a pre-defined factorization that can be equated to a standard DCNN layer.…”

Section: Related Workmentioning

confidence: 99%

Cyclic Sparsely Connected Architectures for Compact Deep Convolutional Neural Networks

Hosseini

Manjunath

Prakash

et al. 2021

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

In deep convolutional neural networks (DCNNs), model size and computation complexity are two important factors governing throughput and energy efficiency when deployed to hardware for inference. Recent works on compact DCNNs as well as pruning methods are effective, yet with drawbacks. For instance, more than half the size of all MobileNet models lies in their last two layers, mainly because compact separable convolution (CONV) layers are not applicable to their last fullyconnected (FC) layers. Also, in pruning methods the compression is gained at the expense of irregularity in the DCNN architecture, which necessitates additional indexing memory to address non-zero weights, thereby increasing memory footprint, decompression delays, and energy consumption. In this paper, we propose cyclic sparsely connected (CSC) architectures, with memory/computation complexity of O(N log N) where N is the number of nodes/channels given a DCNN layer, that, contrary to compact depthwise separable layers, can be used as an overlay for both FC and CONV layers of O(N 2 ). Also, contrary to pruning methods, CSC architectures are structurally sparse and require no indexing due to their cyclic nature. We show that both standard convolution and depthwise convolution layers are special cases of the CSC layers and whose mathematical function, along with FC layers, can be unified into one single formulation, and whose hardware implementation can be carried out under one arithmetic logic component. We examine the efficacy of the CSC architectures for compression of LeNet, AlexNet, and MobileNet DCNNs with precision ranging from 2 to 32 bits. More specifically, we surge upon the compact 8-bit quantized 0.5 MobileNet V1 and show that by compressing its last two layers with CSC architectures, the model is compressed by ∼ 1.5× with a size of only 873 KB and little accuracy loss. Lastly, we design a configurable hardware that implements all types of DCNN layers including FC, CONV, depthwise, CSC-FC, and CSC-CONV indistinguishably within a unified pipeline. We implement the hardware on a tiny Xilinx FPGA for total on-chip processing of the compressed MobileNet that, compared to the related work, has the highest Inference/J while utilizing the smallest FPGA.

show abstract

“…Second, the performance gains of CNNs might come at a high computational cost. While an abundance of computing resources might be available at the training phase of CNNs, the resulting inference engines may be deployed settings such as network edge [ 53 ] that are constrained in terms of computational resources and energy consumption and favor tight coupling between the RF circuits (sensing component) [ 54 , 55 , 56 ]. Unless addressed, the high computation and energy cost of CNNs might be a significant limiting factor towards broader adoption.…”

Section: Introductionmentioning

confidence: 99%

Robust Computationally-Efficient Wireless Emitter Classification Using Autoencoders and Convolutional Neural Networks

Almazrouei

Gianini

Almoosa

et al. 2021

Sensors

View full text Add to dashboard Cite

This paper proposes a novel Deep Learning (DL)-based approach for classifying the radio-access technology (RAT) of wireless emitters. The approach improves computational efficiency and accuracy under harsh channel conditions with respect to existing approaches. Intelligent spectrum monitoring is a crucial enabler for emerging wireless access environments that supports sharing of (and dynamic access to) spectral resources between multiple RATs and user classes. Emitter classification enables monitoring the varying patterns of spectral occupancy across RATs, which is instrumental in optimizing spectral utilization and interference management and supporting efficient enforcement of access regulations. Existing emitter classification approaches successfully leverage convolutional neural networks (CNNs) to recognize RAT visual features in spectrograms and other time-frequency representations; however, the corresponding classification accuracy degrades severely under harsh propagation conditions, and the computational cost of CNNs may limit their adoption in resource-constrained network edge scenarios. In this work, we propose a novel emitter classification solution consisting of a Denoising Autoencoder (DAE), which feeds a CNN classifier with lower dimensionality, denoised representations of channel-corrupted spectrograms. We demonstrate—using a standard-compliant simulation of various RATs including LTE and four latest Wi-Fi standards—that in harsh channel conditions including non-line-of-sight, large scale fading, and mobility-induced Doppler shifts, our proposed solution outperforms a wide range of standalone CNNs and other machine learning models while requiring significantly less computational resources. The maximum achieved accuracy of the emitter classifier is 100%, and the average accuracy is 91% across all the propagation conditions.

show abstract

Butterfly Transform: An Efficient FFT Based Neural Architecture Design

Cited by 25 publications

References 34 publications

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Cyclic Sparsely Connected Architectures for Compact Deep Convolutional Neural Networks

Robust Computationally-Efficient Wireless Emitter Classification Using Autoencoders and Convolutional Neural Networks

Contact Info

Product

Resources

About