Motion-based video segmentation has been studied for many years and remains challenging. Ill-posed problems must be solved when seeking for a fully automated solution, so it is increasingly popular to maintain users in the processing loop by letting them set parameters or draw mattes to guide the segmentation process. When processing multiple-view videos, however, the amount of user interaction should not be proportional to the number of views. In this paper we present a novel sparse segmentation algorithm for twoview stereoscopic videos that maintains temporal coherence and view consistency throughout. We track feature points on both views with a generic tracker and analyse the pairwise affinity of both temporally overlapping and disjoint tracks, whereas existing similar techniques only exploit the information available when tracks overlap. The use of stereo-disparity also allows our technique to process jointly feature tracks on both views, exhibiting a good view consistency in the segmentation output. To make up for the lack of high level understanding inherent to segmentation techniques, we allow the user to refine the output with a split-and-merge approach so as to obtain a desired view-consistent segmentation output over many frames in a few clicks. We present several real video examples to illustrate the versatility of our technique.