Object-aware systems such as Deep Shape prior SLAM (DSP-SLAM), provide a feasible technique for creating environment sparse maps , while representing scene objects as complete 3D models. Such systems provide a compelling solution for improving the intelligence of care robots and enriching the user experience in augmented (AR) applications. However, owing to the abrupt and unpredictable movements exhibited by users during AR engagements and real-time robot responses to changes in situations and commands, the robustness and speed of sensor data processing are imperative. DSP-SLAM suffers from a low-performance speed of 10 -15 fps, though it is based on ORB-SLAM2 which can run at 30 fps. This is mainly because the instance segmentation approach has an average latency of 53ms(18.86fps). To improve tracking robustness, keyframes must be processed at a fast rate. We use a state-of-the-art one-stage deep learning detector, which significantly reduces the wait time for detection-based data association during keyframe creation, and finally present Robust Deep Shape Prior SLAM (RDSP-SLAM). The results show that segmentation was performed at 20ms (50fps), while the object 3D reconstruction quality is the same as that of DSP-SLAM. RDSP-SLAM accepts RGB sequential images at 30fps and tracks them at a mean latency of 38fps.