In this paper, we propose a framework to map stationary sound sources while simultaneously localise a moving robot. Conventional methods for localisation and sound source mapping rely on a microphone array and either, a proprioceptive sensor only (such as wheel odometry) or an additional exteroceptive sensor (such as cameras or lasers) to get accurately the robot locations. Since odometry drifts over time and sound observations are bearing-only, sparse and extremely noisy, the former can only deal with relatively short trajectories before the whole map drifts. In comparison, the latter can get more accurate trajectory estimation over long distances and a better estimation of the sound source map as a result. However, in most of the work in the literature, trajectory estimation and sound source mapping are treated as uncorrelated, which means an update on the robot trajectory does not propagate properly to the sound source map. In this paper, we proposed an efficient method to correlate robot trajectory with sound source mapping by exploiting the conditional independence property between two maps estimated by two different Simultaneous Localisation and Mapping (SLAM) algorithms running in parallel. In our approach, the first map has the flexibility that can be built with any SLAM algorithm (filtering or optimisation) to estimate robot poses with an exteroceptive sensor. The second map is built by using a filtering-based SLAM algorithm locating all stationary sound sources parametrised with Inverse Depth Parametrisation (IDP). Robot locations used during IDP initialisation are the common features shared between the two SLAM maps, which allow to propagate information accordingly. Comprehensive simulations and experimental results show the effectiveness of the proposed method.