5G communication systems operating above 24 GHz have promising properties for user localization and environment mapping. Existing studies have either relied on simplified abstract models of the signal propagation and the measurements, or are based on direct positioning approaches, which directly map the received waveform to a position. In this study, we consider an intermediate approach, which consists of four phases—downlink data transmission, multi-dimensional channel estimation, channel parameter clustering, and simultaneous localization and mapping (SLAM) based on a novel likelihood function. This approach can decompose the problem into simpler steps, thus leading to lower complexity. At the same time, by considering an end-to-end processing chain, we are accounting for a wide variety of practical impairments. Simulation results demonstrate the efficacy of the proposed approach.