The computer vision community has made tremendous progress in solving a variety of semantic image understanding tasks, such as classification and segmentation. With the advancement of imaging technology and hardware, image semantic segmentation, through the use of deep learning, is among the most common topics which have been worked on in the last decade. However, image semantic segmentation suffers from several drawbacks such as insufficient detection of object boundaries. In this study, we present a new convolutional neural network architecture called CSU-Net that aims to self-enhance the results of semantic segmentation. The proposed model consists of two strongly concatenated encoder-decoder blocks. With this design, we reduced requirements on computing power and memory size to decrease costs and increase the training/prediction speed. This study also demonstrates the advantage of the proposed system for small training data sets. The proposed approach has been implemented on our private dataset, as well as on a publicly available dataset. A comparative analysis was carried out with four popular segmentation models and three other recently introduced architectures to show the efficiency of the proposed system. CSU-Net outperformed the other competing neural networks that we considered for the comparative study. As an example, it succeeded in improving the traditional U-Net result by approximately 50% in mean Intersection over Union (mIoU) for both tested datasets. Based on our experience, the CSU-Net can improve results of semantic segmentation in many applications.