In this paper, we present an encoder-decoder architecture that exploits global and local semantics for the automatic image colorization problem. For the global semantics, the low-level encoding features are fine-tuned by the scene-context classification to integrate the global image style. Moreover, the architecture deals with the uncertainty and relations among the scene styles based on the label smoothing and pre-trained weights from Places365. For local semantics, three branches learn the mutual benefits at the pixel-level, in which average and multi-modal distributions are respectively created from regression and soft-encoding branches, while the segmentation branch determines to which object the pixel belongs. Our experiments, which involve training with the Coco-Stuff dataset and validation on DIV2K, Places365, and ImageNet, show that our results are very encouraging.