Scribble labels have gained increasing attention in the field of weakly supervised video salient object detection (VSOD). Based on scribble labels, latest methods can spread labeled pixels to unlabeled regions using local coherence loss, but predicted objects often lose detail and boundary information. In this work, a novel method based on back‐foreground weight contrast is proposed that adds label enhancement points to facilitate the model to learn the edge, detail and location of salient object. Additionally, a new VSOD framework based on global structural localization is introduced. Enhanced scribble labels are used to assist the model for global localization, and then the located regions are finely segmented by the trained model. Extensive experiments demonstrate that the method achieves the state‐of‐the‐art performance on common VSOD datasets, with an improvement of 3.75%, 4.68%, and 0.88% in S‐measure, F‐measure, and MAE, respectively.