Though quite a few image segmentation benchmark datasets have been constructed, there is no suitable benchmark for semantic image segmentation. In this thesis, we first construct a benchmark for such a purpose, where the groundtruths are generated by leveraging the existing fine granular groundtruths in the Berkeley Segmentation Dataset (BSD) as well as using an interactive segmentation tool for new images. We also propose a percept-tree-based region merging strategy for dynamically adapting the groundtruth for evaluating test segmentation. Moreover, we propose a new evaluation metric that is easy to understand and compute, and does not require boundary matching. Experimental results show that, compared with the BSD, the generated groundtruth dataset is more suitable for evaluating semantic image segmentation, and the conducted user study demonstrates that the proposed evaluation metric matches user ranking very well. In the second part of this thesis, we focus on segmentation application by utilizing prior information (i.e., depth in this thesis) to improve segmentation quality. To the best of our knowledge, little work has been attempted so far to achieve automatic image segmentation on RGB-D image. Users are usually asked to input scribbles to indicate the foreground and background or the framework needs to be trained on a database to obtain the bounding box for a specified target. All these methods require external information. We propose to utilize Kinect shadow information into state-of-the-art algorithms for automatic foreground segmentation and multiple object segmentation. Experimental results demonstrate that the proposed shadowassisted segmentation methods can achieve fully automatic cutout with superior segmentation performance.