Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

Dao, Tri; Gu, Albert; Eichhorn, Matthew; Rudra, Atri; Ré, Christopher

doi:10.48550/arxiv.1903.05895

Cited by 3 publications

(2 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The procedure explained in algorithm 1 can be represented by a butterfly graph similar to the FFT's graph. The butterfly network structure has been used for function representation [26] and fast factorization for approximating linear transformation [6]. We adopt this graph as an architecture design for the layers of a neural network.…”

Section: Butterfly Neural Networkmentioning

confidence: 99%

Butterfly Transform: An Efficient FFT Based Neural Architecture Design

Vahid

Prabhu

Farhadi

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

In this paper, we introduce the Butterfly Transform (BFT ), a light weight channel fusion method that reduces the computational complexity of point-wise convolutions from O(n 2 ) of conventional solutions to O(n log n) with respect to the number of channels while improving the accuracy of the networks under the same range of FLOPs. The proposed BFT generalizes the Discrete Fourier Transform in a way that its parameters are learned at training time. Our experimental evaluations show that replacing channel fusion modules with BFT results in significant accuracy gains at similar FLOPs across a wide range of network architectures. For example, replacing channel fusion convolutions with BFT offers 3% absolute top-1 improvement for MobileNetV1-0.25 and 2.5% for ShuffleNet V2-0.5 while maintaining the same number of FLOPS. Notably, the ShuffleNet-V2+BFT outperforms state-of-the-art architecture search methods MNasNet[36] and FBNet [38]. We also show that the structure imposed by BFT has interesting properties that ensures the efficacy of the resulting network.Preprint. Under review.

show abstract

Section: Butterfly Neural Networkmentioning

confidence: 99%

Butterfly Transform: An Efficient FFT Based Neural Architecture Design

Vahid

Prabhu

Farhadi

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…Another way to reduce the model (and optimizer) memory required for storing and training a neural network is to replace weight matrices with special structured matrices, such as low-rank matrices [Sainath et al, 2013], Toeplitz-like matrices [Sindhwani et al, 2015], block-circulant matrices [Cheng et al, 2015, Ding et al, 2017, Fastfood transforms [Yang et al, 2015], low displacement rank matrices [Thomas et al, 2018], and butterfly matrices [Dao et al, 2019].…”

Section: Other Techniquesmentioning

confidence: 99%

Low-Memory Neural Network Training: A Technical Report

Sohoni¹,

Aberger²,

Leszczynski³

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Memory is increasingly often the bottleneck when training neural network models. Despite this, techniques to lower the overall memory requirements of training have been less widely studied compared to the extensive literature on reducing the memory requirements of inference. In this paper we study a fundamental question: How much memory is actually needed to train a neural network? To answer this question, we profile the overall memory usage of training on two representative deep learning benchmarks -the WideResNet model for image classification and the DynamicConv Transformer model for machine translation -and comprehensively evaluate four standard techniques for reducing the training memory requirements: (1) imposing sparsity on the model, (2) using low precision, (3) microbatching, and (4) gradient checkpointing. We explore how each of these techniques in isolation affects both the peak memory usage of training and the quality of the end model, and explore the memory, accuracy, and computation tradeoffs incurred when combining these techniques. Using appropriate combinations of these techniques, we show that it is possible to the reduce the memory required to train a WideResNet-28-2 on CIFAR-10 by up to 60.7x with a 0.4% loss in accuracy, and reduce the memory required to train a DynamicConv model on IWSLT'14 German to English translation by up to 8.7x with a BLEU score drop of 0.15.

show abstract

Graph Structure of Neural Networks

You¹,

Leskovec²,

He³

et al. 2020

Preprint

View full text Add to dashboard Cite

Neural networks are often represented as graphs of connections between neurons. However, despite their wide use, there is currently little understanding of the relationship between the graph structure of the neural network and its predictive performance. Here we systematically investigate how does the graph structure of neural networks affect their predictive performance. To this end, we develop a novel graph-based representation of neural networks called relational graph, where layers of neural network computation correspond to rounds of message exchange along the graph structure. Using this representation we show that:(1) a sweet spot of relational graphs leads to neural networks with significantly improved predictive performance; (2) neural networks performance is approximately a smooth function of the clustering coefficient and average path length of its relational graph; (3) our findings are consistent across many different tasks and datasets; (4) the sweet spot can be identified efficiently; (5) top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks. Our work opens new directions for the design of neural architectures and the understanding on neural networks in general.

show abstract

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

Cited by 3 publications

References 27 publications

Butterfly Transform: An Efficient FFT Based Neural Architecture Design

Butterfly Transform: An Efficient FFT Based Neural Architecture Design

Low-Memory Neural Network Training: A Technical Report

Graph Structure of Neural Networks

Contact Info

Product

Resources

About