Weakly supervised semantic segmentation (WSSS) using only image-level labels can greatly reduce the annotation cost and therefore has attracted considerable research interest. However, its performance is still inferior to the fully supervised counterparts. To mitigate the performance gap, we propose a saliency guided self-attention network (SGAN) to address the WSSS problem. The introduced self-attention mechanism is able to capture rich and extensive contextual information but also may mis-spread attentions to unexpected regions. To enable this mechanism work effectively under weak supervision, we integrate class-agnostic saliency priors into the self-attention mechanism to prevent the attentions on discriminative parts from misspreading to the background. And meanwhile we utilize classspecific attention cues as an additional supervision for SGAN, which reduces the mis-spread of attentions in regions belonging to different foreground categories. The proposed approach is able to produce dense and accurate localization cues, by which the segmentation performance is boosted. Experiments on PASCAL VOC 2012 dataset show that the proposed approach outperforms all other state-of-the-art methods.Abstract-Weakly supervised semantic segmentation (WSSS) using only image-level labels can greatly reduce the annotation cost and therefore has attracted considerable research interest. However, its performance is still inferior to the fully supervised counterparts. To mitigate the performance gap, we propose a saliency guided self-attention network (SGAN) to address the WSSS problem. The introduced self-attention mechanism is able to capture rich and extensive contextual information but also may mis-spread attentions to unexpected regions. To enable this mechanism work effectively under weak supervision, we integrate class-agnostic saliency priors into the self-attention mechanism to prevent the attentions on discriminative parts from misspreading to the background. And meanwhile we utilize classspecific attention cues as an additional supervision for SGAN, which reduces the mis-spread of attentions in regions belonging to different foreground categories. The proposed approach is able to produce dense and accurate localization cues, by which the segmentation performance is boosted. Experiments on PASCAL VOC 2012 dataset show that the proposed approach outperforms all other state-of-the-art methods.