Recently, we experience the increasing prevalence of wearable cameras, some of which feature Wireless Local Area Network (WLAN) connectivity, and the abundance of mobile devices equipped with on-board camera and WLAN modules. Motivated by this fact, this work presents an indoor localization system that leverages both imagery and WLAN data for enabling and supporting a wide variety of envisaged location-aware applications ranging from ambient and assisted living to indoor mobile gaming and retail analytics. The proposed solution integrates two complementary localization approaches, i.e., one based on WLAN and another one based on image location-dependent data, using a fusion engine. Two fusion strategies are developed and investigated to meet different requirements in terms of accuracy, run time, and power consumption. The one is a light-weight threshold-based approach that combines the location outputs of two localization algorithms, namely a WLAN-based algorithm that processes signal strength readings from the surrounding wireless infrastructure using an extended Naive Bayes approach and an image-based algorithm that follows a novel approach based on hierarchical vocabulary tree of SURF (Speeded Up Robust Features) descriptors. The second fusion strategy employs a particle filter algorithm that operates directly on the WLAN and image readings and also includes prior position estimation information in the localization process. Extensive experimental results using real-life data from an indoor office environment indicate that the proposed fusion strategies perform well and are competitive against standalone WLAN and imaged-based algorithms, as well as alternative fusion localization solutions.