“…In the past decade, the successful scene parsing methods rely on handcrafted local features like colour histogram and textons [6,29,30,31,32,33], and shallow classifiers such as Boosting [6,34], Random Forests [35,36], Support Vector Machines [37]. Due to the limited discriminative power of local features, a lot of efforts have been put into developing probabilistic graphical models such as CRFs to enforce spatial consistency and incorporate rich contextual information [38,7,39,40]. Recently, deep learning methods typified by DCNNs have achieved state-of-the-art performance on various computer vision tasks, such as image classification and multi-class object detection.…”