Ears are responsible for the sense of hearing, which is indispensable for a human to understand the environment. It is similar to an intelligent vehicle where acoustical data contains useful and valuable information to comprehend the environment to enable autonomous driving. One important application of acoustical data processing is to localize sound sources. However, due to the complexity of microphone array setup and the unstable localization performance against noises, few research articles tried to localize surrounding vehicles based on on-board microphone arrays. The main contribution is the first implementation to combine roof-mounted microphones and high definition maps as vehicle' s ears for online 3D localization of surround vehicles. This article designed a six-microphones sensor array dedicated to the on-board application and improved sound source localization algorithms using the observer technique to adapt to dynamic driving scenarios. A map-based localization enhancement method is also proposed to make the system more robust to noises and model errors. Feasibility tests are conducted with real vehicles in the scenarios of pass-by and overtaking. Experimental results validate the feasibility of using microphones to localize surrounding vehicles with lane-level accuracy.
INTRODUCTIONAcoustical data contains useful and valuable information for the comprehension of the environment to enable autonomous driving. However, in the current research of intelligent connected vehicles (ICV), acoustical data processing is not valued as much as other perception techniques. The most widely employed sensors in ICV are microwave radar [1], LiDAR [2], and cameras [3]. These sensors can provide precise and detailed information of the traffic scene for localization and detection. For this reason, these sensors are also called the eyes of ICV. Nevertheless, in some situations, the eyes could be temporarily blind and cause road safety disasters. For example, LiDAR is sensitive to rainy days, while cameras can be influenced by fog, smoke, and dirt on the lens. To address these problems, plenty of sensor fusion methods were proposed to compensate for the limitations of using a single sensor. [4]. Filter-based methods, such as EKF [5] and UKF [6], combine inertia and visual information. [7] and [8] This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.