Unmanned Aerial Vehicles (UAVs) are becoming popular nowadays due to their versatility and flexibility for indoor applications, such as the autonomous visual inspection for the inner surface of a pressure vessel. Nevertheless, robust and reliable position estimation is critical for completing these tasks. Visual Odometry (VO) and Visual Simultaneous Localisation and Mapping (VSLAM) allow the UAV to estimate its position in unknown environments. However, traditional feature-based VO/VSLAM systems struggle to deal with complex scenes such as low illumination and textureless environments. Replacing the traditional features with deep learning-based features provides the advantage of handling the challenging environment, but the efficiency is ignored. In this work, an efficient VO system based on a novel lightweight feature extraction network for UAV onboard platforms has been developed. The Deformable Convolution (DFConv) is utilised to improve the feature extraction capability. Owing to the limited onboard computing capability, the Depthwise Separable Convolution (DWConv) is adopted to calculate the offsets for the deformable convolution and construct the backbone network to improve the feature extraction efficiency. Experiments on public datasets indicate that the efficiency of the VO system is improved by 30.03% while preserving the accuracy on embedded platforms with the feature points and descriptors detected by the proposed Convolutional Neural Network (CNN). Moreover, the proposed VO system is verified through UAV flying tests in a real-world scenario. The results prove that the proposed VO system is able to handle the challenging environments where both the latest traditional and deep learning feature-based VO/VSLAM systems fail, and it is feasible for UAV self-localisation and autonomous navigation in the confined, low illumination and textureless indoor environment.