Convex Relaxations of Convolutional Neural Nets

Bartan, Burak; Pilancı, Mert

doi:10.1109/icassp.2019.8683242

Cited by 3 publications

(2 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another interesting research direction is investigating efficient relaxations of our vector output convex programs for larger scale simulations. Convex relaxations for scalar output ReLU networks with approximation guarantees were studied in (Bartan & Pilanci, 2019;Ergen & Pilanci, 2019b;a;d'Aspremont & Pilanci, 2020). Furthermore, landscapes of vector output neural networks and dynamics of gradient descent type methods can be analyzed by leveraging our results.…”

Section: Discussionmentioning

confidence: 92%

Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms

Sahiner¹,

Ergen²,

Pauly³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

We describe the convex semi-infinite dual of the two-layer vector-output ReLU neural network training problem. This semi-infinite dual admits a finite dimensional representation, but its support is over a convex set which is difficult to characterize. In particular, we demonstrate that the non-convex neural network training problem is equivalent to a finite-dimensional convex copositive program. Our work is the first to identify this strong connection between the global optima of neural networks and those of copositive programs. We thus demonstrate how neural networks implicitly attempt to solve copositive programs via semi-nonnegative matrix factorization, and draw key insights from this formulation. We describe the first algorithms for provably finding the global minimum of the vector output neural network training problem, which are polynomial in the number of samples for a fixed data rank, yet exponential in the dimension. However, in the case of convolutional architectures, the computational complexity is exponential in only the filter size and polynomial in all other parameters. We describe the circumstances in which we can find the global optimum of this neural network training problem exactly with soft-thresholded SVD, and provide a copositive relaxation which is guaranteed to be exact for certain classes of problems, and which corresponds with the solution of Stochastic Gradient Descent in practice.

show abstract

Section: Discussionmentioning

confidence: 92%

Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms

Sahiner¹,

Ergen²,

Pauly³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…These results demonstrate that polynomial activation neural networks are a promising direction for further exploration.Convexity of infinitely wide neural networks was first considered in [8] and later in [5]. A convex geometric characterization of finite width neural networks was developed in [12,11,6]. Exact convex optimization representations of finite width two-layer ReLU neural network problems were developed first in [35] and extended to leaky ReLU [25] and polynomial activation functions [7].…”

mentioning

confidence: 99%

Training Quantized Neural Networks to Global Optimality via Semidefinite Programming

Bartan,

Pilanci

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Neural networks (NNs) have been extremely successful across many tasks in machine learning. Quantization of NN weights has become an important topic due to its impact on their energy efficiency, inference time and deployment on hardware. Although post-training quantization is well-studied, training optimal quantized NNs involves combinatorial non-convex optimization problems which appear intractable. In this work, we introduce a convex optimization strategy to train quantized NNs with polynomial activations. Our method leverages hidden convexity in twolayer neural networks from the recent literature, semidefinite lifting, and Grothendieck's identity. Surprisingly, we show that certain quantized NN problems can be solved to global optimality in polynomial-time in all relevant parameters via semidefinite relaxations. We present numerical examples to illustrate the effectiveness of our method. of the SDP. Our techniques lead to a provable guarantee for the difference between the resulting loss and the optimal non-convex loss for the first time. Prior workRecently, there has been a lot of research effort in the realm of compression and quantization of neural networks for real-time implementations. In [41], the authors proposed a method that reduces network weights into ternary values by performing training with ternary values. Experiments in [41] show that their method does not suffer from performance degradation and achieve 16x compression compared to the original model. The authors in [18] focus on compressing dense layers using quantization to tackle model storage problems for large-scale models. The work presented in [21] also aims to compress deep networks using a combination of pruning, quantization and Huffman coding. In [30], the authors present a quantization scheme where they use different bit-widths for different layers (i.e., bit-width optimization). Other works that deal with fixed point training include [29], [19], [22]. Furthermore, [4] proposes layer-wise quantization based on ℓ 2 -norm error minimization followed by retraining of the quantized weights. However, these studies do not address optimal approximation. In comparison, our approach provides optimal quantized neural networks.In [2], it was shown that the degree two polynomial activation functions perform comparably to ReLU activation in practical deep networks. Specifically, it was reported in [2] that for deep neural networks, ReLU activation achieves a classification accuracy of 0.96 and a degree two polynomial activation yields an accuracy of 0.95 on the Cifar-10 dataset. Similarly for the Cifar-100 dataset, it is possible to obtain an accuracy of 0.81 for ReLU activation and 0.76 for the degree two polynomial activation. These numerical results are obtained for the activation σ(t) = t + 0.1t 2 . Furthermore, in encrypted computing, it is desirable to have low degree polynomials as activation functions. For instance, homomorphic encryption methods can only support additions and multiplications in a straightforward way. These constraints mak...

show abstract

Convex Optimization for Shallow Neural Networks

Ergen

Pilancı

2019

2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

Self Cite

View full text Add to dashboard Cite

Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this study, we examine the use of convex optimization theory and sparse recovery models to refine the training process of neural networks and provide a better interpretation of their optimal weights. We focus on training two-layer neural networks with piecewise linear activations and demonstrate that they can be formulated as a finite-dimensional convex program. These programs include a regularization term that promotes sparsity, which constitutes a variant of group Lasso. We first utilize semi-infinite programming theory to prove strong duality for finite width neural networks and then we express these architectures equivalently as high dimensional convex sparse recovery models. Remarkably, the worstcase complexity to solve the convex program is polynomial in the number of samples and number of neurons when the rank of the data matrix is bounded, which is the case in convolutional networks. To extend our method to training data of arbitrary rank, we develop a novel polynomial-time approximation scheme based on zonotope subsampling that comes with a guaranteed approximation ratio. We also show that all the stationary of the nonconvex training objective can be characterized as the global optimum of a subsampled convex program. Our convex models can be trained using standard convex solvers without resorting to heuristics or extensive hyper-parameter tuning unlike non-convex methods. Due to the convexity, optimizer hyperparameters such as initialization, batch sizes, and step size schedules have no effect on the final model. Through extensive numerical experiments, we show that convex models can outperform traditional non-convex methods and are not sensitive to optimizer hyperparameters. The code for our experiments is available at https://github.com/pilancilab/convex nn.

show abstract

Convex Relaxations of Convolutional Neural Nets

Cited by 3 publications

References 21 publications

Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms

Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms

Training Quantized Neural Networks to Global Optimality via Semidefinite Programming

Convex Optimization for Shallow Neural Networks

Contact Info

Product

Resources

About