Neural networks (NNs) have been extremely successful across many tasks in machine learning. Quantization of NN weights has become an important topic due to its impact on their energy efficiency, inference time and deployment on hardware. Although post-training quantization is well-studied, training optimal quantized NNs involves combinatorial non-convex optimization problems which appear intractable. In this work, we introduce a convex optimization strategy to train quantized NNs with polynomial activations. Our method leverages hidden convexity in twolayer neural networks from the recent literature, semidefinite lifting, and Grothendieck's identity. Surprisingly, we show that certain quantized NN problems can be solved to global optimality in polynomial-time in all relevant parameters via semidefinite relaxations. We present numerical examples to illustrate the effectiveness of our method. of the SDP. Our techniques lead to a provable guarantee for the difference between the resulting loss and the optimal non-convex loss for the first time.
Prior workRecently, there has been a lot of research effort in the realm of compression and quantization of neural networks for real-time implementations. In [41], the authors proposed a method that reduces network weights into ternary values by performing training with ternary values. Experiments in [41] show that their method does not suffer from performance degradation and achieve 16x compression compared to the original model. The authors in [18] focus on compressing dense layers using quantization to tackle model storage problems for large-scale models. The work presented in [21] also aims to compress deep networks using a combination of pruning, quantization and Huffman coding. In [30], the authors present a quantization scheme where they use different bit-widths for different layers (i.e., bit-width optimization). Other works that deal with fixed point training include [29], [19], [22]. Furthermore, [4] proposes layer-wise quantization based on ℓ 2 -norm error minimization followed by retraining of the quantized weights. However, these studies do not address optimal approximation. In comparison, our approach provides optimal quantized neural networks.In [2], it was shown that the degree two polynomial activation functions perform comparably to ReLU activation in practical deep networks. Specifically, it was reported in [2] that for deep neural networks, ReLU activation achieves a classification accuracy of 0.96 and a degree two polynomial activation yields an accuracy of 0.95 on the Cifar-10 dataset. Similarly for the Cifar-100 dataset, it is possible to obtain an accuracy of 0.81 for ReLU activation and 0.76 for the degree two polynomial activation. These numerical results are obtained for the activation σ(t) = t + 0.1t 2 . Furthermore, in encrypted computing, it is desirable to have low degree polynomials as activation functions. For instance, homomorphic encryption methods can only support additions and multiplications in a straightforward way. These constraints mak...