It is a big challenge for unsupervised video segmentation without any object annotation or prior knowledge. In this article, we formulate a completely unsupervised video object segmentation network which can pop out the most salient object in an input video by self-growth, called Pop-Net. Specifically, in this article, a novel self-growth strategy which helps a base segmentation network to gradually grow to stick out the salient object as the video goes on, is introduced. To solve the sample generation problem for the unsupervised method, the sample generation module which fuses the appearance and motion saliency is proposed. Furthermore, the proposed sample optimization module improves the samples by using contour constrains for each self-growth step. Experimental results on several datasets (DAVIS, DAVSOD, VideoSD, Segtrack-v2) show the effectiveness of the proposed method. In particular, the state-of-the-art methods on completely unfamiliar datasets (no fine-tuned datasets) are performed.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.