There has been an increasing interest in reducing the computational cost to develop efficient deep convolutional neural networks (DCNN) for real-time semantic segmentation. In this paper, we introduce an efficient convolution method, Spatio-Channel dilated convolution (SCDC) which is composed of structured sparse kernels based on the principle of split-transform-merge. Specifically, it employs the kernels whose shapes are dilated, not only in spatial domain, but also in channel domain, using a channel sampling approach. Based on SCDC, we propose an efficient convolutional module named Efficient Spatio-Channel dilated convolution (ESC). With ESC modules, we further propose ESCNet based on ESPNet architecture which is one of the state-of-the-art real-time semantic segmentation network that can be easily deployed on edge devices. We evaluated our ESCNet on the Cityscapes dataset and obtained competitive results, with a good trade-off between accuracy and computational cost. The proposed ESCNet achieves 61.5 % mean intersection over union (IoU) with only 196 K network parameters, and processes high resolution images at a rate of 164 frames per second (FPS) on a standard Titan Xp GPU. Various experimental results show that our method is reasonably accurate, light, and fast. INDEX TERMS Semantic segmentation, spatio-channel dilated convolution, efficient neural network, real-time processing.