The introduction of multiple viewpoints inevitably increases the bitrates to store and transmit video scenes. To reduce the compressed bitrates, researchers have developed to skip intermediate viewpoints during compression and delivery, and finally reconstruct them with Side Information (SI). Generally, the depth maps can be utilized to construct SI; however, it shows inferior performance with inaccurate reconstruction or high bitrates. In this paper, we propose a multi-view video coding based on SI of Generative Adversarial Network (GAN). At the encoder, we construct a spatio-temporal Epipolar Plane Image (EPI) and further utilize convolutional network to extract the latent code of GAN as SI; while at the decoder side, we combine the SI and adjacent viewpoints to reconstruct intermediate views by the generator of GAN. In particular, we set a joint encoder constraint of reconstruction cost and SI entropy, in order to achieve an optimal tradeoff between reconstruction quality and bitrate overhead. Experiments show a significantly improved Rate-Distortion (RD) performance compared with the state-of-the-art methods.