In this article, we explore the potential contribution of multimodal context information to object detection in an "intelligent car". The used car platform incorporates subsystems for the detection of objects from local visual patterns, as well as for the estimation of global scene properties (sometimes denoted "scene context" or just "context") such as the shape of the road area or the 3D position of the ground plane. Annotated data recorded on this platform is publicly available as the "HRI RoadTraffic" vehicle video dataset, which forms the basis for this investigation.In order to quantify the contribution of context information, we investigate whether it can be used to infer object identity with little or no reference to local patterns of visual appearance. Using a challenging vehicle detection task based on the "HRI RoadTraffic" dataset, we train selected algorithms ("context models") to estimate object identity from context information alone. In the course of our performance evaluations, we also analyze the effect of typical real-world conditions (noise, high input dimensionality, environmental variation) on context model performance.As a principal result, we show that the learning of context models is feasible with all tested algorithms, and that object identity can be estimated from context information with similar accuracy as by relying on local pattern recognition methods. We also find that the use of basis function representations [1] (also known as "population codes") allows the simplest (and therefore most efficient) learning methods to perform best in the benchmark, suggesting that the use of context is feasible even in systems operating under strong performance constraints.