Ground-truth data is essential for the objective evaluation of object detection methods in computer vision. Many works claim their method is robust but they support it with experiments which are not quantitatively assessed with regard some ground-truth. This is one of the main obstacles to properly evaluate and compare such methods. One of the main reasons is that creating an extensive and representative groundtruth is very time consuming, specially in the case of video sequences, where thousands of frames have to be labelled. Could such a ground-truth be generated, at least in part, automatically ? Though it may seem a contradictory question, we show that this is possible for the case of video sequences recorded from a moving vehicle. The key idea is to manually label the frames in one sequence and then be able to transfer this segmentation into another video sequence recorded at a different time on the same track, possibly under a different ambient lighting. We have carried out experiments on several video sequence pairs and quantitatively assessed the precision of the transformed ground-truth, which prove that this method is quite accurate.