Accurately assessing the driver's situational awareness is crucial in level 3 (L 3 ) autonomous driving, where the driver is in the loop. Estimating the attention zone provides essential information about the drivers' on/off-road visual attention and determines their readiness to take over the control from the autonomous agent in complicated situations. This paper proposes a double-phase pipeline to improve the explainability and accuracy of the attention zone estimation using an intermediate gaze regression layer, where the true relationships between the input images and output zone labels are interpretable. The proposed GazeMobileNet, a lightweight deep neural network, in the first phase, achieved state-of-the-art performance in estimating the gaze vector in the MPIIGaze dataset, with MAE of 2.37 degrees. The model was used to extract the corresponding gaze vectors from the LISA V2, which is a driving dataset with the incabin attention zone labels. As LISA V2 does not contain gaze vector labels, an unsupervised clustering approach was proposed in the second phase to categorize the driver's gaze vectors and map them to the corresponding attention zones. The proposed method demonstrated improved accuracy and robustness in the zone classification task. This model achieved the accuracies of 75.67% and 83.08% for attention zone estimation under "daytime without eyeglasses" and "nighttime without eyeglasses" capture conditions, respectively. Furthermore, the proposed model surpassed the recent research on that dataset by 73.11% and 74.02% accuracies under the "daytime with eyeglasses" and "nighttime with eyeglasses" capture conditions, respectively.INDEX TERMS Level 3 autonomy, gaze estimation, GazeMobileNet, driver's attention zone, explainable clustering.