We describe a simple way to reduce the amount of required training data in context-based models of realtime object detection. We demonstrate the feasibility of our approach in a very challenging vehicle detection scenario comprising multiple weather, environment and light conditions such as rain, snow and darkness (night). The investigation is based on a real-time detection system effectively composed of two trainable components: an exhaustive multiscale object detector ("signal-driven detection"), as well as a module for generating object-specific visual attention ("context models") controlling the signal-driven detection process. Both parts of the system require a significant amount of ground-truth data which need to be generated by human annotation in a time-consuming and costly process.Assuming sufficient training examples for signal-based detection, we demonstrate that a co-training step can eliminate the need for separate ground-truth data to train context models. This is achieved by directly training context models with the results of signal-driven detection. We show that this process is feasible for different qualities of signal-driven detection, and maintains the performance gains from context models.As it is by now widely accepted that signal-driven object detection can be significantly improved by context models, our method allows to train strongly improved detection systems without additional labor, and above all, cost.