Because of the complementary nature of visual and inertial sensors, the combination of both is able to provide fast and accurate 6 degree-of-freedom state estimation, which is the fundamental requirement for robotic (especially, unmanned aerial vehicle) navigation tasks in Global Positioning System-denied environments. This article presents a computationally efficient visual-inertial fusion algorithm, by separating orientation fusion from the position fusion process. The algorithm is designed to perform 6 degree-of-freedom state estimation, based on a gyroscope, an accelerometer and a monocular visual-based simultaneous localisation and mapping algorithm measurement. It also recovers the visual scale for the monocular visual-based simultaneous localisation and mapping. In particular, the fusion algorithm treats the orientation fusion and position fusion as two separate processes, where the orientation fusion is based on a very efficient gradient descent algorithm, whereas the position fusion is based on a 13-state linear Kalman filter. The elimination of the magnetometer sensor avoids the problem of magnetic distortion, which makes it a power-on-and-go system once the accelerometer is factory calibrated. The resulting algorithm shows a significant computational reduction over the conventional extended Kalman filter, with competitive accuracy. Moreover, the separation between orientation and position fusion processes enables the algorithm to be easily implemented onto two individual hardware elements and thus allows the two fusion processes to be executed concurrently.