Vision-based bronchoscopy (VB) models require the registration of the virtual lung model with the frames from the videobronchoscopy to provide an effective guidance during the biopsy. The registration can be achieved by either tracking the position and orientation of the bronchoscopy camera, or by calibrating its deviation from the pose (position and orientation) simulated in the virtual lung model. Recent advances in neural networks and temporal image processing have provided new opportunities for guided bronchoscopy. However, such progress has been hindered by the lack of comparative experimental conditions.In the present paper we share a novel synthetic dataset allowing for a fair comparison of methods. Moreover, this paper investigates several neural network architectures for the learning of temporal information at different levels of subject personalization. In order to improve orientation measurement, we also present a standartized comparison framework and a novel metric for camera orientation learning. Results on the dataset show that the proposed metric and architectures, as well as the standardized conditions, provide notable improvements to current state of the art camera pose estimation in videobronchoscopy.