This paper focuses on the advancement of a monocular sparse-SLAM algorithm via two techniques: Local feature maintenance and descriptor-based sensor fusion. We present two techniques that maintain the descriptor of a local feature: Pooling and bestfit. The maintenance procedure aims at defining more accurate descriptors, increasing matching performance and thereby tracking accuracy. Moreover, sensors besides the camera can be used to improve tracking robustness and accuracy via sensor fusion. State-of-the-art sensor fusion techniques can be divided into two categories. They either use a Kalman filter that includes sensor data in its state vector to conduct a posterior pose update, or they create world-aligned image descriptors with the help of the gyroscope. This paper is the first to compare and combine these two approaches. We release a new evaluation dataset which comprises 21 scenes that include a dense ground truth trajectory, IMU data, and camera data. The results indicate that descriptor pooling significantly improves pose accuracy. Furthermore, we show that descriptor-based sensor fusion outperforms Kalman filter-based approaches (EKF and UKF).