This paper introduces a comprehensive framework for the detection of behaviors indicative of reduced concentration levels among motor vehicle operators, leveraging multimodal image data. By integrating dedicated deep learning models, our approach systematically analyzes RGB images, depth maps, and thermal imagery to identify driver drowsiness and distraction signs. Our novel contribution includes utilizing state-of-the-art convolutional neural networks (CNNs) and bidirectional long short-term memory (Bi-LSTM) networks for effective feature extraction and classification across diverse distraction scenarios. Additionally, we explore various data fusion techniques, demonstrating their impact on improving detection accuracy. The significance of this work lies in its potential to enhance road safety by providing more reliable and efficient tools for the real-time monitoring of driver attentiveness, thereby reducing the risk of accidents caused by distraction and fatigue. The proposed methods are thoroughly evaluated using a multimodal benchmark dataset, with results showing their substantial capabilities leading to the development of safety-enhancing technologies for vehicular environments. The primary challenge addressed in this study is the detection of driver states not relying on the lighting conditions. Our solution employs multimodal data integration, encompassing RGB, thermal, and depth images, to ensure robust and accurate monitoring regardless of external lighting variations