The fast and robust segmentation of tongue images is a prerequisite to achieve automatic tongue diagnosis in traditional Chinese medicine. In order to assist tongue diagnosis in real-life scenarios, an ideal tongue segmentation method would need to obtain the entire tongue body as well as its precise contours. However, the similar appearance among the tongue body, the coating, and the lips hinders the performance for most unsupervised learning methods that primarily utilize low-level visual features. On the other hand, although the supervised deep convolutional neural networks (DCNNs) that typically depend on the widely used crossentropy loss can achieve better accuracy, they are very prone to segment image to multiple trivially areas. To address both of the above issues, we make an attempt to boost the segmentation performance of DCNNs with a novel auxiliary loss function that seeks to exploit large margin learning for end-to-end tongue segmentation models. Specifically, we first propose a loss function that involves interclass and intraclass costs to directly measure the distance among pixels that belong to different connected regions. Then, we explore the potential ability of this loss function as a regularization for different segmentation networks such as those with attention modules and the deeply supervised network. Finally, a theoretical analysis for this learning scheme is presented. Through experiments on challenging datasets, we show that the proposed approach can be easily integrated into state-of-the-art networks to boost their performance at the tongue segmentation task without bells and whistles.