PurposeThe recent innovations of Industry 4.0 have made it possible to easily collect data related to a production environment. In this context, information about industrial equipment – gathered by proper sensors – can be profitably used for supporting predictive maintenance (PdM) through the application of data-driven analytics based on artificial intelligence (AI) techniques. Although deep learning (DL) approaches have proven to be a quite effective solutions to the problem, one of the open research challenges remains – the design of PdM methods that are computationally efficient, and most importantly, applicable in real-world internet of things (IoT) scenarios, where they are required to be executable directly on the limited devices’ hardware.Design/methodology/approachIn this paper, the authors propose a DL approach for PdM task, which is based on a particular and very efficient architecture. The major novelty behind the proposed framework is to leverage a multi-head attention (MHA) mechanism to obtain both high results in terms of remaining useful life (RUL) estimation and low memory model storage requirements, providing the basis for a possible implementation directly on the equipment hardware.FindingsThe achieved experimental results on the NASA dataset show how the authors’ approach outperforms in terms of effectiveness and efficiency the majority of the most diffused state-of-the-art techniques.Research limitations/implicationsA comparison of the spatial and temporal complexity with a typical long-short term memory (LSTM) model and the state-of-the-art approaches was also done on the NASA dataset. Despite the authors’ approach achieving similar effectiveness results with respect to other approaches, it has a significantly smaller number of parameters, a smaller storage volume and lower training time.Practical implicationsThe proposed approach aims to find a compromise between effectiveness and efficiency, which is crucial in the industrial domain in which it is important to maximize the link between performance attained and resources allocated. The overall accuracy performances are also on par with the finest methods described in the literature.Originality/valueThe proposed approach allows satisfying the requirements of modern embedded AI applications (reliability, low power consumption, etc.), finding a compromise between efficiency and effectiveness.