Accurate and robust indoor navigation systems are crucial in fields like robotics and autonomous vehicles. In the absence of an absolute positioning system like GPS, there is no single sensor that can provide an accurate and robust indoor navigation solution. The presented thesis tackles the indoor navigation challenge using two approaches; multisensor fusion and semantic information. In the first approach, visual odometry is enhanced by the fusion of inertial sensors and wireless ranging measurements. The fusion filter is based on Extended Kalman Filter (EKF). Stereo vision can provide 3D positioning by triangulating visual features. However, depth estimation errors and expensive computation are key challenges. The developed multi-sensor system has dual-mode where stereo vision is applied first to estimate inertial sensor biases. Once converged, the estimated biases help the system to switch to a monocular mode which reduces the system complexity and enables the tracking of faster movements with higher frame rates. As both visual and inertial tracking are drifting solutions, wireless ranging/positioning is integrated into the system to provide absolute global positioning and ensure overall accuracy. In the second approach, an improved Visual Simultaneous Localization and Mapping (VSLAM) solution using semantic segmentation and layout estimation is developed. The system utilizes advanced semantic segmentation and indoor layout estimation to optimize map representation and increase positioning accuracy. A testbed has been developed to collect indoor multi-sensor data and to perform experiments and analysis.