For the robotic positioning and navigation, visual odometry (VO) system is widely used. However, the errors of the traditional VO accumulate when the robot moves. Besides, this paper proposes a new framework to solve the problem of monocular VO, called MagicVO. Based on the convolutional neural network (CNN) and the bi-directional LSTM (Bi-LSTM), MagicVO outputs a 6-DoF absolute-scale pose at each position of the camera with a sequence of continuous monocular images as input. It does not only utilize the outstanding performance of CNN in extracting the rich features of image frames fully but also learns the geometric relationship from image sequences pre and post through Bi-LSTM to get a more accurate prediction. A pipeline of the MagicVO is shown in this paper. The MagicVO is an end-to-end system, and the results of the experiments on the KITTI and ETH datasets show that MagicVO has a better performance than the traditional VO systems in the accuracy of pose and the generalization ability.