In in‐vehicle driving scenarios, composite action recognition is crucial for improving safety and understanding the driver's intention. Due to spatial constraints and occlusion factors, the driver's range of motion is limited, thus resulting in similar action patterns that are difficult to differentiate. Additionally, collecting skeleton data that characterise the full human posture is difficult, posing additional challenges for action recognition. To address the problems, a novel Graph‐Reinforcement Transformer (GR‐Former) model is proposed. Using limited skeleton data as inputs, by introducing graph structure information to directionally reinforce the effect of the self‐attention mechanism, dynamically learning and aggregating features between joints at multiple levels, the authors’ model constructs a richer feature vector space, enhancing its expressiveness and recognition accuracy. Based on the Drive & Act dataset for composite action recognition, the authors’ work only applies human upper‐body skeleton data to achieve state‐of‐the‐art performance compared to existing methods. Using complete human skeleton data also has excellent recognition accuracy on the NTU RGB + D‐ and NTU RGB + D 120 dataset, demonstrating the great generalisability of the GR‐Former. Generally, the authors’ work provides a new and effective solution for driver action recognition in in‐vehicle scenarios.