We consider distributed optimization over a d-dimensional space, where K remote clients send coded gradient estimates over an additive Gaussian Multiple Access Channel (MAC) with noise variance σ 2 z . Furthermore, the codewords from the clients must satisfy the average power constraint P , resulting in a signal-to-noise ratio (SNR) of KP/σ 2 z . In this paper, we study the fundamental limits imposed by MAC on the convergence rate of any distributed optimization algorithm and design optimal communication schemes to achieve these limits. Our first result is a lower bound for the convergence rate, showing that communicating over a MAC imposes a slowdown of d/ 1 2 log(1 + SNR) on any protocol compared to the centralized setting. Next, we design a computationally tractable digital communication scheme that matches the lower bound to a logarithmic factor in K when combined with a projected stochastic gradient descent algorithm. At the heart of our communication scheme is carefully combining several compression and modulation ideas such as quantizing along random bases, Wyner-Ziv compression, modulo-lattice decoding, and amplitude shift keying. We also show that analog schemes, which are popular due to their ease of implementation, can give close to optimal convergence rates at low SNR but experience a slowdown of roughly √ d at high SNR.