Head movements, combined with gaze, play a fundamental role in predicting a person’s action and intention. In non-constrained head movement settings, the process is complex, and performance can degrade significantly in the presence of variation in head-pose, gaze position, occlusion and ambient illumination. In this thesis, a framework is therefore proposed to fuse and combine head-pose and gaze information to obtain more robust and accurate gaze estimation. Specific contributions include: the development of a newly developed graph-based model for pupil localization and accurate estimation of the pupil center; the proposal of a novel iris region descriptor feature using quadtree decomposition, that works together with pupil localization for gaze estimation; the proposal of kernel-based extensions and enhancements to a fusion mechanism known as Discriminative Multiple Canonical Correlation Analysis (DMCCA) for fusing features (proposed and traditional) together, to generate a refined, high quality feature set for classification; and the newly developed methodology of head-pose features based on quadtree decompositions and geometrical moments, to better integrate roll, yaw, pitch and jawline into the overall estimation framework. The experimental results of the proposed framework demonstrate robustness against variations in illumination, occlusion, head-pose and is calibration free. The proposed framework was validated on several datasets and scored: 4.5° using MPII, 4.4° using Cave, 4.8° using EYEDIAP, 5.0° using ACS, 4.1° using OSLO and 4.5° using UULM datasets respectively.