Vision-based methods are very popular for simultaneous localization and environment mapping (SLAM). One can imagine that exploiting the natural acoustic landscape of the robot's environment can prove to be a useful alternative to vision SLAM. Visual SLAM depends on matching local features between images, whereas distributed acoustic SLAM is based on matching acoustic events. Proposed DASLAM is based on distributed microphone arrays, where each microphone is connected to a separate, moving, controllable recording device, which requires compensation for their different clock shifts. We show that this controlled mobility is necessary to deal with underdetermined cases. Estimation is done using particle filtering.Results show that both tasks can be accomplished with good precision, even for the theoretically underdetermined cases. For example, we were able to achieve mapping error as low as 17.53 cm for sound sources with localization error of 18.61 cm and clock synchronization error of 42 μs for 2 robots and 2 sources.