Abstract-In this report, we propose an object learning system that incorporates sensory information from an automotive radar system and a video camera. The radar system provides a coarse attention for the focus of visual analysis on relatively small areas within the image plane. The attended visual areas are coded and learned by a 3-layer neural network utilizing what is called inplace learning: each neuron is responsible for the learning of its own processing characteristics within the connected network environment, through inhibitory and excitatory connections with other neurons. The modeled bottom-up, lateral, and top-down connections in the network enable sensory sparse coding, unsupervised learning and supervised learning to occur concurrently. The presented work is applied to learn two types of encountered objects in multiple outdoor driving settings. Cross validation results show that the overall recognition accuracy is above 95% for the radar-attended window images. In comparison with the uncoded representation and purely unsupervised learning (without top-down connection), the proposed network improves the overall recognition rate by 15.93% and 6.35%, respectively. The proposed system is also compared with other learning algorithms favorably. The result indicates that our learning system is the only one fit for the incremental and online object learning in the real-time driving environment.Index Terms-Intelligent vehicle system, sensor fusion, object learning, biologically inspired neural network, sparse coding.