Performing autonomous driving in urban environments is a challenging task, especially when there is a reduced visibility of traffic participants in complex driving scenarios. For this reason, we investigate the advantages of cooperative perception systems to enhance on-board perception capabilities. In this paper, we present a cooperative roadside vision system for augmenting the embedded perception of an autonomous vehicle navigating in a complex urban scenario. In particular, we use an HD map to implement a map-aided tracking system that merges the information from both onboard and remote sensors. The road users detected by the on-board LiDAR are represented as bounding polygons that include the localization uncertainty whereas, for the camera, the detected bounding boxes are projected in the map frame using a geometric constrained optimization. We report experimental results using two experimental vehicles and a roadside camera in a real traffic scenario in a roundabout. These results quantify how the cooperative data fusion extends the field of view and how the accuracy of the pose estimation of perceived objects is improved.