Facial Action Unit (AU) detection is of major importance in a broad range of artificial intelligence applications such as healthcare, Facial Expression Recognition (FER), and mental state analysis. In this paper, we present an innovative, resource-efficient facial AU detection model, embedding both spatial and channel attention mechanisms into a convolutional neural network (CNN). Along with a unique data input system leveraging image data and binary-encoded AU activation labels, our model enhances AU detection capabilities while simultaneously offering interpretability for FER systems. In contrast to existing state-of-the-art models, our proposal's streamlined architecture, combined with superior performance, establishes it as an ideal solution for resource-limited environments like mobile and embedded systems with computational constraints. The model was trained and evaluated utilizing the BP4D, CK+, DISFA, FER2013+, and RAF-DB datasets, with the latter two being particularly significant as they represent wild datasets for facial expression recognition. These datasets encompass ground truth emotions matched with corresponding AU activations according to the Facial Action Coding System. Various metrics, including F1 score, accuracy, and Euclidean distance, showcase its effectiveness in AU detection and interpretability.