Fusion of vision-based and inertial pose estimation has many high-potential applications in navigation, robotics, and augmented reality. Our research aims at the development of a fully mobile, completely self-contained tracking system, that is able to estimate sensor motion from known 3D scene structure. This requires a highly modular and scalable software architecture for algorithm design and testing. As the main contribution of this paper, we discuss the design of our hybrid tracker and emphasize important features: scalability, code reusability, and testing facilities. In addition, we present a mobile augmented reality application, and several first experiments with a fully mobile vision-inertial sensor head. Our hybrid tracking system is not only capable of real-time performance, but can also be used for offline analysis of tracker performance, comparison with ground truth, and evaluation of several pose estimation and information fusion algorithms.