Abstract-The problem of tracking multiple moving speakers in indoor environments has recently received much attention. Earlier techniques were based purely on vision, but the theoretical and algorithmic advances and a constant growth in speed of processing have led to the emergence of techniques which allow the fusion of audio and visual data. The fusion of multi-modal information has been shown to be instrumental in improving tracking performance, as well as robustness in the case of challenging situations like occlusions (by the limited field of view of cameras or by other speakers). However, data fusion algorithms often suffer from noise corrupting the sensor measurements which cause non-negligible detection errors. Here, a novel approach to combining audio and visual data is proposed. In our framework, we employ audio data as an aid to particle filter (PF) based visual tracking, by using the direction of arrival angles of the audio sources to reshape the typical Gaussian noise distribution of particles in the propagation step and to weight the observation model in the measurement step. This approach is further improved by solving a typical problem associated with the PF. It has been observed that the efficiency and accuracy of the PF usually depend on the number of particles and noise variance used in the estimation and propagation functions for re-allocating these particles at each iteration. Both of these parameters are specified beforehand and are kept fixed in the regular implementation of the PF which makes the tracker unstable in practice. To address these problems, we design an algorithm which adapts both the number of particles and noise variance based on tracking error and the area occupied by the particles in the image. Experiments on the AV 16.3 dataset show the advantage of our proposed methods over the baseline PF method and an existing adaptive PF algorithm for tracking occluded speakers with a significantly reduced number of particles.Index Terms-Audio-visual speaker tracking, particle filter, adaptive particle filter.