“…Scene segmentation (or scene parsing, semantic segmentation) is one of the fundamental problems in computer vision and has drawn lots of attentions. Recently, thanks to the great success of Convolutional Neural Networks (CNNs) in computer vision [42,68,71,52,25,72,27,80,26], lots of CNNs based segmentation works have been proposed and have achieved great progress [29,22,81,83,84,70,60]. For example, Long et al [54] introduce the fully convolutional networks (FCN) in which the fully connected layers in standard CNNs are transformed to convolutional layers.…”