A method for infrared and cameras sensor fusion, applied to indoor positioning in intelligent spaces, is proposed in this work. The fused position is obtained with a maximum likelihood estimator from infrared and camera independent observations. Specific models are proposed for variance propagation from infrared and camera observations (phase shifts and image respectively) to their respective position estimates and to the final fused estimation. Model simulations are compared with real measurements in a setup designed to validate the system. The difference between theoretical prediction and real measurements is between 0.4 cm (fusion) and 2.5 cm (camera), within a 95% confidence margin. The positioning precision is in the cm level (sub-cm level can be achieved at most tested positions) in a 4 × 3 m locating cell with 5 infrared detectors on the ceiling and one single camera, at distances from target up to 5 m and 7 m respectively. Due to the low cost system design and the results observed, the system is expected to be feasible and scalable to large real spaces.