“…[7], [11], [13], [15], and body movements (like hand movement) [15]- [17]. However, these existing vision-based systems have some limitations: (i) capturing sensors used by the aforementioned systems are either expensive camera(s) [7]- [10], [14], [15], [17] with any additional sensor/hardware [11], [12], [16], [17] or some specialized imaging sensor (e.g., eye tracker [6] and Kinect [13]); (ii) some of the systems used only single parameters such as pupil [12], PER-CLOS [10] and head pose [14] to estimate driver's attentional state making the system unable to adapt to some situations which are common in the real driving scenario (for example, turning head or wearing sun glass can hide eye) and resulting in incorrect attentional state detection; (iii) some existing systems detect single inattentional state [8]- [14], [17] or limited only to the level of the same state [6], whereas some other focus on detecting driver's activity [15], [16]; (iv) some systems do not have any alert system to warn the driver of any inattentional state when detected during driving [6]- [8], [10], [11], [14]- [16]; (v) some of the previous works [11], [15], [17] were evaluated in a simulated environment and may not work accurately in the real driving scenario; and (vi) no evidence was provided about the systems mentioned above working in diverse situations (e.g. drivers were having different facial features such as beard, moustache, and hairstyles or wearing accessories (e.g.…”