Simultaneous localization and mapping (SLAM) is widely used by autonomous robots operating in unknown environments. Research community has developed numerous SLAM algorithms in the last 10 years. Several works have presented many algorithms' optimizations. However, they have not explored a system optimization from the system hardware architecture to the algorithmic development level. New computing technologies (SIMD coprocessors, DSP, multi-cores) can greatly accelerate the system processing but require rethinking the algorithm implementation. This article presents an efficient implementation of the EKF-SLAM algorithm on a multi-processor architecture. The algorithm-architecture adequacy aims to optimize the implementation of the SLAM algorithm on a low-cost and heterogeneous architecture (implementing an ARM processor with SIMD coprocessor and a DSP core). Experiments were conducted with an instrumented platform. Results aim to demonstrate that an optimized implementation of the algorithm, resulting from an optimization methodology, can help to design embedded systems implementing low-cost multiprocessor architecture operating under real-time constraints.