Recent advancements in target tracking using Wi-Fi signals and channel state information (CSI) have significantly improved the accuracy and efficiency of tracking mobile targets. However, there remains a gap in developing a comprehensive approach that combines CSI, an unscented Kalman filter (UKF), and a sole self-attention mechanism to accurately estimate the position, velocity, and acceleration of targets in real-time. Furthermore, optimizing the computational efficiency of such approaches is necessary for their applicability in resource-constrained environments. To bridge this gap, this research study proposes a novel approach that addresses these challenges. The approach leverages CSI data collected from commodity Wi-Fi devices and incorporates a combination of the UKF and a sole self-attention mechanism. By fusing these elements, the proposed model provides instantaneous and precise estimates of the target’s position while considering factors such as acceleration and network information. The effectiveness of the proposed approach is demonstrated through extensive experiments conducted in a controlled test bed environment. The results exhibit a remarkable tracking accuracy level of 97%, affirming the model’s ability to successfully track mobile targets. The achieved accuracy showcases the potential of the proposed approach for applications in human-computer interactions, surveillance, and security.