<div class="section abstract"><div class="htmlview paragraph">In our research paper, we propose a novel method for identifying road actor intention in autonomous systems. We utilize a trainable neural network based on the Transformer architecture with a masked Auto-Encoder to analyze video sequences, eliminating the need for explicit object detection, object tracking and other such multiple methods in-order to predict the event. This prediction can be fed into the sensor fusion algorithm of any active safety system to reduce false positives and enhance functional efficiency. Our approach outperforms other non-transformer based neural network architectures on real-world driving data, offering potential for fine-grained road event understanding and improving autonomous vehicle safety and efficiency.</div></div>